当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Heterogeneous Univariate Outlier Ensembles in Multidimensional Data
ACM Transactions on Knowledge Discovery from Data ( IF 4.0 ) Pub Date : 2020-09-29 , DOI: 10.1145/3403934
Guansong Pang 1 , Longbing Cao 2
Affiliation  

In outlier detection, recent major research has shifted from developing univariate methods to multivariate methods due to the rapid growth of multidimensional data. However, one typical issue of this paradigm shift is that many multidimensional data often mainly contains univariate outliers , in which many features are actually irrelevant. In such cases, multivariate methods are ineffective in identifying such outliers due to the potential biases and the curse of dimensionality brought by irrelevant features. Those univariate outliers might be well detected by applying univariate outlier detectors in individually relevant features. However, it is very challenging to choose a right univariate detector for each individual feature since different features may take very different probability distributions. To address this challenge, we introduce a novel Heterogeneous Univariate Outlier Ensembles (HUOE) framework and its instance ZDD to synthesize a set of heterogeneous univariate outlier detectors as base learners to build heterogeneous ensembles that are optimized for each individual feature. Extensive results on 19 real-world datasets and a collection of synthetic datasets show that ZDD obtains 5%–14% average AUC improvement over four state-of-the-art multivariate ensembles and performs substantially more robustly w.r.t. irrelevant features.

中文翻译:

多维数据中的异构单变量异常值集合

在异常值检测方面,由于多维数据的快速增长,最近的主要研究已从开发单变量方法转向多变量方法。然而,这种范式转变的一个典型问题是,许多多维数据通常主要包含单变量异常值,其中许多特征实际上是不相关的。在这种情况下,由于潜在的偏差和不相关特征带来的维度灾难,多变量方法无法有效识别此类异常值。通过在单独相关的特征中应用单变量异常值检测器,可以很好地检测到这些单变量异常值。然而,为每个单独的特征选择正确的单变量检测器是非常具有挑战性的,因为不同的特征可能采用非常不同的概率分布。为了应对这一挑战,我们引入了一种新颖的异构单变量异常值集成 (HUOE) 框架及其实例 ZDD,以合成一组异构单变量异常值检测器作为基础学习器,以构建针对每个单独特征进行优化的异构集成。
更新日期:2020-09-29
down
wechat
bug