当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Internal Evaluation of Unsupervised Outlier Detection
ACM Transactions on Knowledge Discovery from Data ( IF 3.6 ) Pub Date : 2020-06-29 , DOI: 10.1145/3394053
Henrique O. Marques 1 , Ricardo J. G. B. Campello 2 , Jörg Sander 3 , Arthur Zimek 4
Affiliation  

Although there is a large and growing literature that tackles the unsupervised outlier detection problem, the unsupervised evaluation of outlier detection results is still virtually untouched in the literature. The so-called internal evaluation, based solely on the data and the assessed solutions themselves, is required if one wants to statistically validate (in absolute terms) or just compare (in relative terms) the solutions provided by different algorithms or by different parameterizations of a given algorithm in the absence of labeled data. However, in contrast to unsupervised cluster analysis, where indexes for internal evaluation and validation of clustering solutions have been conceived and shown to be very useful, in the outlier detection domain, this problem has been notably overlooked. Here we discuss this problem and provide a solution for the internal evaluation of outlier detection results. Specifically, we describe an index called Internal, Relative Evaluation of Outlier Solutions (IREOS) that can evaluate and compare different candidate outlier detection solutions. Initially, the index is designed to evaluate binary solutions only, referred to as top - n outlier detection results. We then extend IREOS to the general case of non-binary solutions, consisting of outlier detection scorings. We also statistically adjust IREOS for chance and extensively evaluate it in several experiments involving different collections of synthetic and real datasets.

中文翻译:

无监督异常值检测的内部评估

尽管有大量且不断增长的文献来解决无监督异常值检测问题,但无监督的异常值检测问题评估异常值检测结果在文献中几乎未触及。如果想要统计验证(绝对)或只是比较(相对)不同算法或不同参数化提供的解决方案,则需要所谓的内部评估,仅基于数据和评估的解决方案本身在没有标记数据的情况下给定算法。然而,与无监督聚类分析相比,在非监督聚类分析中,用于内部评估和验证聚类解决方案的索引已经被构思出来并被证明非常有用,在异常值检测领域,这个问题明显被忽视了。在这里,我们讨论这个问题,并为异常值检测结果的内部评估提供解决方案。具体来说,我们描述了一个名为 Internal 的索引,异常值解决方案的相对评估 (IREOS),可以评估和比较不同的候选异常值检测解决方案。最初,该指数旨在仅评估二元解决方案,称为最佳-n异常值检测结果。然后,我们将 IREOS 扩展到非二进制解决方案的一般情况,包括异常值检测评分。我们还针对机会对 IREOS 进行了统计调整,并在涉及不同合成数据集和真实数据集的多个实验中对其进行了广泛评估。
更新日期:2020-06-29
down
wechat
bug