当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detecting outliers with one-class selective transfer machine
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2019-10-11 , DOI: 10.1007/s10115-019-01407-5
Hirofumi Fujita , Tetsu Matsukawa , Einoshin Suzuki

In this paper, we propose an outlier detection method from an unlabeled target dataset by exploiting an unlabeled source dataset. Detecting outliers has attracted attention of data miners for over two decades, since such outliers can be crucial in decision making, knowledge discovery, and fraud detection, to name but a few. The fact that outliers are scarce and often tedious to label motivated researchers to propose detection methods from an unlabeled dataset, some of which borrow strengths from relevant labeled datasets in the framework of transfer learning. He et al. tackled a more challenging situation in which the input datasets coming from multiple tasks are all unlabeled. Their method, ML-OCSVM, conducts multi-task learning with one-class support vector machines (SVMs) and yields a mean model plus task-specific increments to detect outliers in the test datasets of the multiple tasks. We inherit a part of their problem setting, taking only unlabeled datasets in the input, but increase the difficulty by assuming only one source dataset in addition to the target dataset. Consequently, the source dataset consists of examples relevant to the target task as well as examples that are less relevant. To cope with this situation, we extend Selective Transfer Machine, which weights individual examples in the framework of covariate shift and learns an SVM classifier, to our one-class setting by replacing the binary SVMs with one-class SVMs. Experiments on two public datasets and an artificial dataset show that our method mostly outperforms baseline methods, including ML-OCSVM and a state-of-the-art ensemble anomaly detection method, in F1 score and AUC.

中文翻译:

使用一类选择性转移机检测异常值

在本文中,我们通过利用未标记的源数据集提出了一种从未标记的目标数据集中检测异常值的方法。检测异常值已吸引数据挖掘人员二十多年的关注,因为此类异常值在决策,知识发现和欺诈检测中至关重要,仅举几例。离群值稀少且通常难以标记的事实促使研究人员从未标记的数据集中提出检测方法,其中一些方法借鉴了迁移学习框架中相关标记的数据集的优势。他等。解决了一个更具挑战性的情况,其中来自多个任务的输入数据集都未标记。他们的方法ML-OCSVM 使用一类支持向量机(SVM)进行多任务学习,并生成均值模型和特定于任务的增量,以检测多个任务的测试数据集中的异常值。我们继承了问题设置的一部分,仅输入了未标记的数据集,但通过假设除目标数据集外仅假设一个源数据集来增加难度。因此,源数据集由与目标任务相关的示例以及相关性较低的示例组成。为了应对这种情况,我们将选择性转移机(它在协变量平移框架中加权单个示例并学习SVM分类器)扩展到我们的一类设置,方法是将二进制SVM替换为一类SVM。F 1分数和AUC。
更新日期:2019-10-11
down
wechat
bug