当前位置: X-MOL 学术J. Netw. Comput. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CREME: A toolchain of automatic dataset collection for machine learning in intrusion detection
Journal of Network and Computer Applications ( IF 8.7 ) Pub Date : 2021-09-07 , DOI: 10.1016/j.jnca.2021.103212
Huu-Khoi Bui , Ying-Dar Lin , Ren-Hung Hwang , Po-Ching Lin , Van-Linh Nguyen , Yuan-Cheng Lai

Intrusion detection is one of the most common approaches for addressing security attacks in modern networks. However, given the increasing diversity of attack behaviors, efficient detection becomes more challenging. Machine learning (ML) has recently dominated as one of the most promising techniques to improve detection accuracy for intrusion detection systems(IDS). With ML-based approaches, a quality dataset for training holds the key to gain high detection performance. Unfortunately, there are few methods to assess the dataset quality, and specifically for ML training. This work presents an automated toolchain, termed CREME (Configuration, REproduction, Multi-dataset, and Evaluation), to generate a dataset and measure its quality and efficiency. CREME integrates various tools to automate all stages of configuration, attack and benign behavior reproduction, data collection, feature extraction, data labeling, and evaluation. CREME can also automatically collect and generate a dataset from multiple sources such as accounting, network traffic, and system logs. Compared with the available datasets in the same category, experiment results show that the datasets generated by CREME contribute up to 20% better performance to ML-based IDS in terms of coverage. They also have significantly better efficiency than most other datasets. The CREME source code is available at https://github.com/buihuukhoi/CREME.



中文翻译:

CREME:入侵检测中机器学习的自动数据集收集工具链

入侵检测是解决现代网络中安全攻击的最常见方法之一。然而,鉴于攻击行为的多样性日益增加,有效检测变得更具挑战性。机器学习 (ML) 最近成为提高入侵检测系统 (IDS) 检测精度的最有前途的技术之一。使用基于机器学习的方法,用于训练的高质量数据集是获得高检测性能的关键。不幸的是,很少有方法可以评估数据集质量,尤其是 ML 训练。这项工作提出一种自动化的工具链,称为乳霜(Ç onfiguration,RE生产,中号ULTI-数据集,和È估值),生成数据集并衡量其质量和效率。CREME 集成了各种工具来自动化配置、攻击和良性行为再现、数据收集、特征提取、数据标记和评估的所有阶段。CREME 还可以从会计、网络流量和系统日志等多个来源自动收集和生成数据集。与同类可用数据集相比,实验结果表明,CREME 生成的数据集对基于 ML 的 IDS 的覆盖率提高了 20%。它们的效率也明显高于大多数其他数据集。CREME 源代码可从 https://github.com/buihuukhoi/CREME 获得。

更新日期:2021-09-10
down
wechat
bug