SPiForest: An Anomaly Detecting Algorithm Using Space Partition Constructed by Probability Density-Based Inverse Sampling.,IEEE Transactions on Neural Networks and Learning Systems

当前位置： X-MOL 学术 › IEEE Trans. Neural Netw. Learn. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SPiForest: An Anomaly Detecting Algorithm Using Space Partition Constructed by Probability Density-Based Inverse Sampling.
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.4 ) Pub Date : 2022-11-30 , DOI: 10.1109/tnnls.2022.3223342
Xiansheng Yang , Yuan Zhuang , Min Shi , Xiaoxiang Cao , Dong Chen , Yufei Tang

The SPiForest, a new isolation-based approach to outlier detection, constructs iTrees on the space containing all attributes by probability density-based inverse sampling. Most existing iForest (iF)-based approaches can precisely and quickly detect outliers scattering around one or more normal clusters. However, the performance of these methods seriously decreases when facing outliers whose nature "few and different" disappears in subspace (e.g., anomalies surrounded by normal samples). To solve this problem, SPiForest is proposed, which is different from existing approaches. First, SPiForest uses the principal component analysis (PCA) to find principal components and estimate each component's probability density function (pdf). Second, SPiForest utilizes the inv-pdf, which is inversely proportional to the pdf estimated from the given dataset, to generate support points in the space containing all attributes. Third, the hyperplane decided by these support points is used to isolate the outliers in the space. Next, these steps are repeated to build an iTree. Finally, many iTrees construct a forest for outlier detection. SPiForest provides two benefits: 1) it isolates outliers with fewer hyperplanes, which significantly improves the accuracy and 2) it effectively detects the outliers whose nature "few and different" disappears in subspace. Comparative analyses and experiments show that the SPiForest achieves a significant improvement in terms of area under the curve (AUC) when compared with the state-of-the-art methods. Specifically, our method improves by at most 17.7% on AUC when compared to iF-based algorithms.

中文翻译：

SPiForest：一种使用基于概率密度的逆采样构造的空间分区的异常检测算法。

SPiForest 是一种新的基于隔离的异常值检测方法，它通过基于概率密度的逆采样在包含所有属性的空间上构建 iTrees。大多数现有的基于 iForest (iF) 的方法都可以准确快速地检测散布在一个或多个正常簇周围的异常值。然而，当面对子空间中性质“很少且不同”消失的异常值（例如，被正常样本包围的异常）时，这些方法的性能会严重下降。为了解决这个问题，SPiForest 被提出，它不同于现有的方法。首先，SPiForest 使用主成分分析 (PCA) 来查找主成分并估计每个成分的概率密度函数 (pdf)。其次，SPiForest 利用 inv-pdf，它与从给定数据集估计的 pdf 成反比，以在包含所有属性的空间中生成支持点。第三，由这些支持点决定的超平面用于隔离空间中的异常值。接下来，重复这些步骤来构建 iTree。最后，许多 iTrees 构建了一个用于异常值检测的森林。SPiForest 提供了两个好处：1）它用更少的超平面隔离异常值，这显着提高了准确性；2）它有效地检测了在子空间中消失的异常值，这些异常值的性质“很少且不同”。比较分析和实验表明，与最先进的方法相比，SPiForest 在曲线下面积 (AUC) 方面取得了显着改进。具体来说，我们的方法最多提高了 17。

更新日期：2022-11-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>