当前位置: X-MOL 学术IEEE Trans. Inform. Forensics Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Random Partitioning Forest for Point-Wise and Collective Anomaly Detection__pplication to Network Intrusion Detection
IEEE Transactions on Information Forensics and Security ( IF 6.3 ) Pub Date : 1-13-2021 , DOI: 10.1109/tifs.2021.3050605
Pierre-Francois Marteau

In this paper, we propose DiFF-RF, an ensemble approach composed of random partitioning binary trees to detect point-wise and collective (as well as contextual) anomalies. Thanks to a distance-based paradigm used at the leaves of the trees, this semi-supervised approach solves a drawback that has been identified in the isolation forest (IF) algorithm. Moreover, taking into account the frequencies of visits in the leaves of the random trees allows to significantly improve the performance of DiFF-RF when considering the presence of collective anomalies. DiFF-RF is fairly easy to train, and good performance can be obtained by using a simple semi-supervised procedure to setup the extra hyper-parameter that is introduced. We first evaluate DiFF-RF on a synthetic data set to i) verify that the limitation of the IF algorithm is overcome, ii) demonstrate how collective anomalies are actually detected and iii) to analyze the effect of the meta-parameters it involves. We assess the DiFF-RF algorithm on a large set of datasets from the UCI repository, as well as four benchmarks related to network intrusion detection applications. Our experiments show that DiFF-RF almost systematically outperforms the IF algorithm and one of its extended variant, but also challenges the one-class SVM baseline, deep learning variational auto-encoder and ensemble of auto-encoder architectures. Finally, DiFF-RF is computationally efficient and can be easily parallelized on multi-core architectures.

中文翻译:


用于逐点和集体异常检测的随机分区森林__在网络入侵检测中的应用



在本文中,我们提出了 DiFF-RF,一种由随机分区二叉树组成的集成方法,用于检测逐点和集体(以及上下文)异常。由于在树叶上使用基于距离的范例,这种半监督方法解决了隔离森林(IF)算法中已发现的缺点。此外,在考虑集体异常的存在时,考虑随机树叶子的访问频率可以显着提高 DiFF-RF 的性能。 DiFF-RF 相当容易训练,并且通过使用简单的半监督程序来设置引入的额外超参数可以获得良好的性能。我们首先在合成数据集上评估 DiFF-RF,以 i) 验证 IF 算法的局限性是否被克服,ii) 演示如何实际检测集体异常,以及 iii) 分析其涉及的元参数的影响。我们在 UCI 存储库中的大量数据集以及与网络入侵检测应用程序相关的四个基准上评估了 DiFF-RF 算法。我们的实验表明,DiFF-RF 几乎系统地优于 IF 算法及其扩展变体之一,但也挑战了一类 SVM 基线、深度学习变分自动编码器和自动编码器架构的集成。最后,DiFF-RF 计算效率高,并且可以在多核架构上轻松并行化。
更新日期:2024-08-22
down
wechat
bug