当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Unsupervised outlier detection in multidimensional data
Journal of Big Data ( IF 8.1 ) Pub Date : 2021-06-02 , DOI: 10.1186/s40537-021-00469-z
Atiq ur Rehman , Samir Brahim Belhaouari

Detection and removal of outliers in a dataset is a fundamental preprocessing task without which the analysis of the data can be misleading. Furthermore, the existence of anomalies in the data can heavily degrade the performance of machine learning algorithms. In order to detect the anomalies in a dataset in an unsupervised manner, some novel statistical techniques are proposed in this paper. The proposed techniques are based on statistical methods considering data compactness and other properties. The newly proposed ideas are found efficient in terms of performance, ease of implementation, and computational complexity. Furthermore, two proposed techniques presented in this paper use transformation of data to a unidimensional distance space to detect the outliers, so irrespective of the data’s high dimensions, the techniques remain computationally inexpensive and feasible. Comprehensive performance analysis of the proposed anomaly detection schemes is presented in the paper, and the newly proposed schemes are found better than the state-of-the-art methods when tested on several benchmark datasets.



中文翻译:

多维数据中的无监督异常值检测

检测和去除数据集中的异常值是一项基本的预处理任务,否则数据分析可能会产生误导。此外,数据中异常的存在会严重降低机器学习算法的性能。为了以无监督的方式检测数据集中的异常,本文提出了一些新颖的统计技术。所提出的技术基于考虑数据紧凑性和其他属性的统计方法。发现新提出的想法在性能、易于实现和计算复杂性方面是有效的。此外,本文提出的两种技术使用将数据转换到一维距离空间来检测异常值,因此无论数据的高维,这些技术在计算上仍然便宜且可行。论文中提出了所提出的异常检测方案的综合性能分析,当在几个基准数据集上进行测试时,发现新提出的方案比最先进的方法更好。

更新日期:2021-06-02
down
wechat
bug