当前位置: X-MOL 学术Cluster Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ODRA: an outlier detection algorithm based on relevant attribute analysis method
Cluster Computing ( IF 3.6 ) Pub Date : 2020-06-13 , DOI: 10.1007/s10586-020-03136-9
Abdul Wahid , Annavarapu Chandra Sekhara Rao

Advances in data acquisition have generated an enormous amount of data that captures business, commercial, technological and scientific information. However, some occurrences are rare or unusual, irrespective of a large amount of data available. These rare occurrences in data mining are usually referred to as outliers or anomalies. All these rare occurrences are infrequent. Sometimes it varies from 0.01% to 10% depending on the type of application. In recent years, outlier detection has become important in many applications and has attracted considerable attention among the increasing number of data mining techniques. Focusing on this has resulted in several outlier detection algorithms, mostly based on distance or density. However, each method has its inherent weaknesses. Methods based on distance have problems with local density, and methods based on density have problems with low-density patterns. In this paper, we present a new outlier detection algorithm based on the relevant attribute analysis (ODRA) for local outlier detection in a high-dimensional dataset. There are two phases of the proposed algorithm. During the preliminary stage, we present a data reduction method that reduces the data set by pruning irrelevant attributes and data points. In the second phase, we propose an outlier detection method based on k-NN kernel density estimation. The experimental results on 15 UCI machine learning repository datasets show the supremacy and effectiveness of our proposed approach over state-of-the-art outlier detection methods.



中文翻译:

ODRA:基于相关属性分析方法的离群值检测算法

数据采集​​的进步已经产生了捕获业务,商业,技术和科学信息的大量数据。但是,无论是否有大量可用数据,某些情况都是罕见或不寻常的。数据挖掘中的这些罕见事件通常称为异常值或异常。这些罕见的情况很少发生。有时,根据应用类型的不同,它的范围从0.01%到10%。近年来,离群值检测在许多应用中已变得重要,并且在越来越多的数据挖掘技术中引起了相当大的关注。专注于此已导致了几种异常检测算法,这些算法主要基于距离或密度。但是,每种方法都有其固有的弱点。基于距离的方法存在局部密度问题,和基于密度的方法存在低密度图案的问题。在本文中,我们提出了一种基于相关属性分析的新的离群值检测算法(ODRA)用于在高维数据集中进行局部离群值检测。该算法分为两个阶段。在初步阶段,我们提出了一种数据精简方法,该方法通过修剪不相关的属性和数据点来精简数据集。在第二阶段,我们提出了一种基于k -NN核密度估计的离群值检测方法。在15个UCI机器学习存储库数据集上的实验结果表明,我们提出的方法优于最新的离群值检测方法。

更新日期:2020-06-13
down
wechat
bug