Robust support vector data description for novelty detection with contaminated data,Engineering Applications of Artificial Intelligence

当前位置： X-MOL 学术 › Eng. Appl. Artif. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Robust support vector data description for novelty detection with contaminated data
Engineering Applications of Artificial Intelligence ( IF 7.5 ) Pub Date : 2020-02-29 , DOI: 10.1016/j.engappai.2020.103554
Kunzhe Wang , Haibin Lan

Support vector data description (SVDD) is a widely used novelty detection algorithm. It provides excellent predictions even in the absence of negative samples and retains the mathematical elegance of Support Vector Machines. The decision boundary can be very flexible due to the incorporation of kernel functions. However, SVDD can suffer a lot from contaminated data containing, for example, outliers or mislabeled observations. Although several weighting schemes have been proposed to find a more reliable description of the target data, the calculation of the weight are themselves affected by the outliers and does not provide much insight in the data. The masked outliers fail to receive lower weight values. The Stahel–Donoho (SD) outlyingness from multivariate statistics is a very robust measure to expose the outliers. To avoid the masking effect, we propose to assign weight to each observation based on the SD outlyingness in an arbitrary kernel space. A robust SVDD is defined down-weighting the samples with large outlyingness. The experimental results demonstrate superiority of the proposed method in terms of AUC for contaminated data.

中文翻译：

可靠的支持向量数据描述，可用于受污染数据的新颖性检测

支持向量数据描述（SVDD）是一种广泛使用的新颖性检测算法。即使在没有负样本的情况下，它也可以提供出色的预测，并保留了支持向量机的数学优雅。由于合并了内核功能，决策边界可能非常灵活。但是，SVDD可能会受到包含异常值或标签错误的观察结果等数据的污染。尽管已经提出了几种加权方案来找到对目标数据的更可靠描述，但是权重的计算本身受异常值的影响，并且不能在数据中提供太多洞察力。遮罩的离群值无法接收较低的权重值。多元统计数据中的Stahel–Donoho（SD）异常值是暴露异常值的非常有效的方法。为了避免掩盖效果，我们建议根据任意核空间中SD离群值为每个观察值分配权重。定义了稳健的SVDD，可以对偏远的样本进行权重降低。实验结果证明了该方法在污染数据的AUC方面的优越性。

更新日期：2020-02-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11