当前位置: X-MOL 学术Expert Syst. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An overview and a benchmark of active learning for outlier detection with one-class classifiers
Expert Systems with Applications ( IF 7.5 ) Pub Date : 2020-11-26 , DOI: 10.1016/j.eswa.2020.114372
Holger Trittenbach , Adrian Englhardt , Klemens Böhm

Active learning methods increase classification quality by means of user feedback. An important subcategory is active learning for outlier detection with one-class classifiers. While various methods in this category exist, selecting one for a given application scenario is difficult. This is because existing methods rely on different assumptions, have different objectives, and often are tailored to a specific use case. All this calls for a comprehensive comparison, the topic of this article.

This article starts with a categorization of the various methods. Interestingly, many assumptions in the literature are implicit, and their impact has not been discussed so far. Based on this, we propose a novel approach to evaluate active learning results by quantifying how classification results evolve with more user feedback, in a compact and nuanced manner. We run over 84,000 experiments to compare state-of-the-art one-class active learning methods, for a broad variety of scenarios. One key finding is that there is no single active learning method that stands out in a competitive evaluation. Instead, we found that selecting a good query strategy alone is not sufficient, since results hinge significantly on other factors, such as the selection of hyperparameter values. Our results show that some configurations are more robust than others. We conclude by phrasing our findings as guidelines on how to select active learning methods for outlier detection with one-class classifiers.



中文翻译:

使用一类分类器进行离群值检测的主动学习概述和基准

主动学习方法通​​过用户反馈提高分类质量。一个重要的子类别是使用一类分类器进行主动学习以进行异常值检测。尽管存在此类别中的各种方法,但是很难为给定的应用场景选择一种方法。这是因为现有方法依赖于不同的假设,具有不同的目标,并且通常针对特定的用例进行定制。所有这些都需要对本文的主题进行全面比较。

本文从各种方法的分类开始。有趣的是,文献中的许多假设都是隐含的,其影响迄今尚未讨论。基于此,我们提出了一种新颖的方法,通过以紧凑​​和细微的方式量化分类结果在更多用户反馈下的演变方式来评估主动学习结果。我们进行了超过84,000个实验,以比较各种情况下的最新一类主动学习方法。一个主要发现是,没有一种主动学习方法在竞争评估中脱颖而出。相反,我们发现仅选择一个好的查询策略是不够的,因为结果在很大程度上取决于其他因素,例如超参数值的选择。我们的结果表明,某些配置比其他配置更健壮。

更新日期:2020-12-14
down
wechat
bug