当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Random Partitioning Forest for Point-Wise and Collective Anomaly Detection -- Application to Intrusion Detection
arXiv - CS - Machine Learning Pub Date : 2020-06-29 , DOI: arxiv-2006.16801
Pierre-Francois Marteau

In this paper, we propose DiFF-RF, an ensemble approach composed of random partitioning binary trees to detect point-wise and collective (as well as contextual) anomalies. Thanks to a distance-based paradigm used at the leaves of the trees, this semi-supervised approach solves a drawback that has been identified in the isolation forest (IF) algorithm. Moreover, taking into account the frequencies of visits in the leaves of the random trees allows to significantly improve the performance of DiFF-RF when considering the presence of collective anomalies. DiFF-RF is fairly easy to train, and excellent performance can be obtained by using a simple semi-supervised procedure to setup the extra hyper-parameter that is introduced. We first evaluate DiFF-RF on a synthetic data set to i) verify that the limitation of the IF algorithm is overcome, ii) demonstrate how collective anomalies are actually detected and iii) to analyze the effect of the meta-parameters it involves. We assess the DiFF-RF algorithm on a large set of datasets from the UCI repository, as well as two benchmarks related to intrusion detection applications. Our experiments show that DiFF-RF almost systematically outperforms the IF algorithm, but also challenges the one-class SVM baseline and a deep learning variational auto-encoder architecture. Furthermore, our experience shows that DiFF-RF can work well in the presence of small-scale learning data, which is conversely difficult for deep neural architectures. Finally, DiFF-RF is computationally efficient and can be easily parallelized on multi-core architectures.

中文翻译:

用于逐点和集体异常检测的随机分区森林——在入侵检测中的应用

在本文中,我们提出了 DiFF-RF,这是一种由随机分区二叉树组成的集成方法,用于检测逐点和集体(以及上下文)异常。由于在树的叶子上使用了基于距离的范式,这种半监督方法解决了隔离森林 (IF) 算法中已经确定的缺点。此外,在考虑集体异常的存在时,考虑到随机树的叶子中的访问频率可以显着提高 DiFF-RF 的性能。DiFF-RF 相当容易训练,并且可以通过使用简单的半监督程序设置引入的额外超参数来获得出色的性能。我们首先在合成数据集上评估 DiFF-RF,以 i) 验证是否克服了 IF 算法的限制,ii) 展示集体异常是如何被实际检测到的,以及 iii) 分析它所涉及的元参数的影响。我们在来自 UCI 存储库的大量数据集上评估 DiFF-RF 算法,以及与入侵检测应用程序相关的两个基准。我们的实验表明,DiFF-RF 几乎系统地优于 IF 算法,但也挑战了一类 SVM 基线和深度学习变分自动编码器架构。此外,我们的经验表明,DiFF-RF 在存在小规模学习数据的情况下可以很好地工作,而这对于深度神经架构来说却是困难的。最后,DiFF-RF 计算效率高,可以轻松地在多核架构上并行化。以及与入侵检测应用程序相关的两个基准。我们的实验表明,DiFF-RF 几乎系统地优于 IF 算法,但也挑战了一类 SVM 基线和深度学习变分自动编码器架构。此外,我们的经验表明,DiFF-RF 在存在小规模学习数据的情况下可以很好地工作,而这对于深度神经架构来说却是困难的。最后,DiFF-RF 计算效率高,可以轻松地在多核架构上并行化。以及与入侵检测应用程序相关的两个基准。我们的实验表明,DiFF-RF 几乎系统地优于 IF 算法,但也挑战了一类 SVM 基线和深度学习变分自动编码器架构。此外,我们的经验表明,DiFF-RF 在存在小规模学习数据的情况下可以很好地工作,而这对于深度神经架构来说却是困难的。最后,DiFF-RF 计算效率高,可以轻松地在多核架构上并行化。我们的经验表明,DiFF-RF 在存在小规模学习数据的情况下可以很好地工作,而这对于深度神经架构来说却是困难的。最后,DiFF-RF 计算效率高,可以轻松地在多核架构上并行化。我们的经验表明,DiFF-RF 在存在小规模学习数据的情况下可以很好地工作,而这对于深度神经架构来说却是困难的。最后,DiFF-RF 计算效率高,可以轻松地在多核架构上并行化。
更新日期:2020-07-01
down
wechat
bug