An Analysis of the KDD99 and UNSW-NB15 Datasets for the Intrusion Detection System,Symmetry

当前位置： X-MOL 学术 › Symmetry › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An Analysis of the KDD99 and UNSW-NB15 Datasets for the Intrusion Detection System
Symmetry ( IF 2.2 ) Pub Date : 2020-10-13 , DOI: 10.3390/sym12101666
Muataz Salam Al-Daweri , Khairul Akram Zainol Ariffin , Salwani Abdullah , Mohamad Firham Efendy Md. Senan

The significant increase in technology development over the internet makes network security a crucial issue. An intrusion detection system (IDS) shall be introduced to protect the networks from various attacks. Even with the increased amount of works in the IDS research, there is a lack of studies that analyze the available IDS datasets. Therefore, this study presents a comprehensive analysis of the relevance of the features in the KDD99 and UNSW-NB15 datasets. Three methods were employed: a rough-set theory (RST), a back-propagation neural network (BPNN), and a discrete variant of the cuttlefish algorithm (D-CFA). First, the dependency ratio between the features and the classes was calculated, using the RST. Second, each feature in the datasets became an input for the BPNN, to measure their ability for a classification task concerning each class. Third, a feature-selection process was carried out over multiple runs, to indicate the frequency of the selection of each feature. From the result, it indicated that some features in the KDD99 dataset could be used to achieve a classification accuracy above 84%. Moreover, a few features in both datasets were found to give a high contribution to increasing the classification’s performance. These features were present in a combination of features that resulted in high accuracy; the features were also frequently selected during the feature selection process. The findings of this study are anticipated to help the cybersecurity academics in creating a lightweight and accurate IDS model with a smaller number of features for the developing technologies.

中文翻译：

入侵检测系统的 KDD99 和 UNSW-NB15 数据集分析

互联网技术发展的显着增加使网络安全成为一个关键问题。应引入入侵检测系统（IDS）以保护网络免受各种攻击。即使 IDS 研究中的工作量增加，但仍缺乏分析可用 IDS 数据集的研究。因此，本研究对 KDD99 和 UNSW-NB15 数据集中特征的相关性进行了综合分析。采用了三种方法：粗糙集理论 (RST)、反向传播神经网络 (BPNN) 和乌贼算法的离散变体 (D-CFA)。首先，使用 RST 计算特征和类之间的依赖关系。其次，数据集中的每个特征都成为 BPNN 的输入，以衡量它们对每个类的分类任务的能力。第三，在多次运行中执行特征选择过程，以指示每个特征的选择频率。结果表明，利用 KDD99 数据集中的某些特征可以达到 84% 以上的分类准确率。此外，发现两个数据集中的一些特征对提高分类性能有很大贡献。这些特征以特征组合的形式存在，从而导致高精度；在特征选择过程中也经常选择特征。预计这项研究的结果将帮助网络安全学者为开发技术创建一个轻量级、准确的 IDS 模型，其特征数量较少。来表示每个特征的选择频率。结果表明，利用 KDD99 数据集中的某些特征可以达到 84% 以上的分类准确率。此外，发现两个数据集中的一些特征对提高分类性能有很大贡献。这些特征以特征组合的形式存在，从而导致高精度；在特征选择过程中也经常选择特征。预计这项研究的结果将帮助网络安全学者为开发技术创建一个轻量级、准确的 IDS 模型，其特征数量较少。来表示每个特征的选择频率。结果表明，利用 KDD99 数据集中的某些特征可以达到 84% 以上的分类准确率。此外，发现两个数据集中的一些特征对提高分类性能有很大贡献。这些特征以特征组合的形式存在，从而导致高精度；在特征选择过程中也经常选择特征。预计这项研究的结果将帮助网络安全学者为开发技术创建一个轻量级、准确的 IDS 模型，其特征数量较少。发现两个数据集中的一些特征对提高分类性能有很大贡献。这些特征以特征组合的形式存在，从而导致高精度；在特征选择过程中也经常选择特征。预计这项研究的结果将帮助网络安全学者为开发技术创建一个轻量级、准确的 IDS 模型，其特征数量较少。发现两个数据集中的一些特征对提高分类性能有很大贡献。这些特征以特征组合的形式存在，从而导致高精度；在特征选择过程中也经常选择特征。预计这项研究的结果将帮助网络安全学者为开发技术创建一个轻量级、准确的 IDS 模型，其特征数量较少。

更新日期：2020-10-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文