当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Average Localised Proximity: A new data descriptor with good default one-class classification performance
Pattern Recognition ( IF 7.5 ) Pub Date : 2021-04-22 , DOI: 10.1016/j.patcog.2021.107991
Oliver Urs Lenz , Daniel Peralta , Chris Cornelis

One-class classification is a challenging subfield of machine learning in which so-called data descriptors are used to predict membership of a class based solely on positive examples of that class, and no counter-examples. A number of data descriptors that have been shown to perform well in previous studies of one-class classification, like the Support Vector Machine (SVM), require setting one or more hyperparameters. There has been no systematic attempt to date to determine optimal default values for these hyperparameters, which limits their ease of use, especially in comparison with hyperparameter-free proposals like the Isolation Forest (IF). We address this issue by determining optimal default hyperparameter values across a collection of 246 one-class classification problems derived from 50 different real-world datasets. In addition, we propose a new data descriptor, Average Localised Proximity (ALP) to address certain issues with existing approaches based on nearest neighbour distances. Finally, we evaluate classification performance using a leave-one-dataset-out procedure, and find strong evidence that ALP outperforms IF and a number of other data descriptors, as well as weak evidence that it outperforms SVM, making ALP a good default choice.



中文翻译:

平均本地化接近度:新的数据描述符,具有良好的默认一类分类性能

一类分类是机器学习的一个具有挑战性的子领域,其中所谓的数据描述符用于仅基于该类的正例而不是反例来预测类的成员资格。许多在以前的一类分类研究中表现良好的数据描述符,如支持向量机 (SVM),需要设置一个或多个超参数。迄今为止,还没有系统地尝试确定这些超参数的最佳默认值,这限制了它们的易用性,尤其是与隔离森林 (IF) 等无超参数的建议相比。我们通过在来自 50 个不同现实世界数据集的 246 个一类分类问题的集合中确定最佳默认超参数值来解决这个问题。此外,我们提出了一个新的数据描述符,平均局部邻近度 (ALP),以解决基于最近邻距离的现有方法的某些问题。最后,我们使用留一数据集程序评估分类性能,并找到了 ALP 优于 IF 和许多其他数据描述符的有力证据,以及它优于 SVM 的弱证据,使 ALP 成为一个很好的默认选择。

更新日期:2021-05-30
down
wechat
bug