NetFlow Datasets for Machine Learning-based Network Intrusion Detection Systems,arXiv - CS - Networking and Internet Architecture

当前位置： X-MOL 学术 › arXiv.cs.NI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

NetFlow Datasets for Machine Learning-based Network Intrusion Detection Systems
arXiv - CS - Networking and Internet Architecture Pub Date : 2020-11-18 , DOI: arxiv-2011.09144
Mohanad Sarhan, Siamak Layeghy, Nour Moustafa, Marius Portmann

Machine Learning (ML)-based Network Intrusion Detection Systems (NIDSs) have proven to become a reliable intelligence tool to protect networks against cyberattacks. Network data features has a great impact on the performances of ML-based NIDSs. However, evaluating ML models often are not reliable, as each ML-enabled NIDS is trained and validated using different data features that may do not contain security events. Therefore, a common ground feature set from multiple datasets is required to evaluate an ML model's detection accuracy and its ability to generalise across datasets. This paper presents NetFlow features from four benchmark NIDS datasets known as UNSW-NB15, BoT-IoT, ToN-IoT, and CSE-CIC-IDS2018 using their publicly available packet capture files. In a real-world scenario, NetFlow features are relatively easier to extract from network traffic compared to the complex features used in the original datasets, as they are usually extracted from packet headers. The generated Netflow datasets have been labelled for solving binary- and multiclass-based learning challenges. Preliminary results indicate that NetFlow features lead to similar binary-class results and lower multi-class classification results amongst the four datasets compared to their respective original features datasets. The NetFlow datasets are named NF-UNSW-NB15, NF-BoT-IoT, NF-ToN-IoT, NF-CSE-CIC-IDS2018 and NF-UQ-NIDS are published at http://staff.itee.uq.edu.au/marius/NIDS_datasets/ for research purposes.

中文翻译：

基于机器学习的网络入侵检测系统的 NetFlow 数据集

事实证明，基于机器学习 (ML) 的网络入侵检测系统 (NIDS) 已成为保护网络免受网络攻击的可靠情报工具。网络数据特征对基于 ML 的 NIDS 的性能有很大影响。然而，评估 ML 模型通常并不可靠，因为每个支持 ML 的 NIDS 都使用不同的数据特征进行训练和验证，这些特征可能不包含安全事件。因此，需要来自多个数据集的共同基础特征集来评估 ML 模型的检测精度及其跨数据集泛化的能力。本文介绍了来自四个基准 NIDS 数据集的 NetFlow 功能，这些数据集称为 UNSW-NB15、BoT-IoT、ToN-IoT 和 CSE-CIC-IDS2018，使用它们的公开数据包捕获文件。在真实场景中，与原始数据集中使用的复杂特征相比，NetFlow 特征相对更容易从网络流量中提取，因为它们通常是从数据包头中提取的。生成的 Netflow 数据集已被标记用于解决基于二进制和多类的学习挑战。初步结果表明，与各自的原始特征数据集相比，NetFlow 特征导致四个数据集中的类似二元类结果和较低的多类分类结果。NetFlow 数据集被命名为 NF-UNSW-NB15、NF-BoT-IoT、NF-ToN-IoT、NF-CSE-CIC-IDS2018 和 NF-UQ-NIDS 发布在 http://staff.itee.uq.edu .au/marius/NIDS_datasets/ 用于研究目的。生成的 Netflow 数据集已被标记用于解决基于二进制和多类的学习挑战。初步结果表明，与各自的原始特征数据集相比，NetFlow 特征导致四个数据集中的类似二元类结果和较低的多类分类结果。NetFlow 数据集被命名为 NF-UNSW-NB15、NF-BoT-IoT、NF-ToN-IoT、NF-CSE-CIC-IDS2018 和 NF-UQ-NIDS 发布在 http://staff.itee.uq.edu .au/marius/NIDS_datasets/ 用于研究目的。生成的 Netflow 数据集已被标记用于解决基于二进制和多类的学习挑战。初步结果表明，与各自的原始特征数据集相比，NetFlow 特征导致四个数据集中的类似二元类结果和较低的多类分类结果。NetFlow 数据集被命名为 NF-UNSW-NB15、NF-BoT-IoT、NF-ToN-IoT、NF-CSE-CIC-IDS2018 和 NF-UQ-NIDS 发布在 http://staff.itee.uq.edu .au/marius/NIDS_datasets/ 用于研究目的。

更新日期：2020-11-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文