Detecting cybersecurity attacks across different network features and learners,Journal of Big Data

当前位置： X-MOL 学术 › J. Big Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Detecting cybersecurity attacks across different network features and learners
Journal of Big Data ( IF 8.6 ) Pub Date : 2021-02-23 , DOI: 10.1186/s40537-021-00426-w
Joffrey L. Leevy , John Hancock , Richard Zuech , Taghi M. Khoshgoftaar

Machine learning algorithms efficiently trained on intrusion detection datasets can detect network traffic capable of jeopardizing an information system. In this study, we use the CSE-CIC-IDS2018 dataset to investigate ensemble feature selection on the performance of seven classifiers. CSE-CIC-IDS2018 is big data (about 16,000,000 instances), publicly available, modern, and covers a wide range of realistic attack types. Our contribution is centered around answers to three research questions. The first question is, “Does feature selection impact performance of classifiers in terms of Area Under the Receiver Operating Characteristic Curve (AUC) and F1-score?” The second question is, “Does including the Destination_Port categorical feature significantly impact performance of LightGBM and Catboost in terms of AUC and F1-score?” The third question is, “Does the choice of classifier: Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Logistic Regression (LR), Catboost, LightGBM, or XGBoost, significantly impact performance in terms of AUC and F1-score?” These research questions are all answered in the affirmative and provide valuable, practical information for the development of an efficient intrusion detection model. To the best of our knowledge, we are the first to use an ensemble feature selection technique with the CSE-CIC-IDS2018 dataset.

中文翻译：

检测跨不同网络功能和学习者的网络安全攻击

在入侵检测数据集上经过有效训练的机器学习算法可以检测能够危害信息系统的网络流量。在这项研究中，我们使用CSE-CIC-IDS2018数据集来研究针对七个分类器性能的集成特征选择。CSE-CIC-IDS2018是大数据（约1600万个实例），是公开可用的，现代的，涵盖了广泛的现实攻击类型。我们的贡献集中在对三个研究问题的回答上。第一个问题是：“特征选择是否会根据接收器工作特性曲线（AUC）和F1分数下的面积影响分类器的性能？” 第二个问题是：“包括Destination_Port类别功能是否会以AUC和F1分数显着影响LightGBM和Catboost的性能？” 第三个问题是 “分类器的选择：决策树（DT），随机森林（RF），朴素贝叶斯（NB），逻辑回归（LR），Catboost，LightGBM或XGBoost是否会显着影响AUC和F1分数的性能？ ” 这些研究问题均得到肯定回答，并为开发有效的入侵检测模型提供了宝贵的实用信息。据我们所知，我们是第一个在CSE-CIC-IDS2018数据集中使用集成特征选择技术的公司。开发有效的入侵检测模型的实用信息。据我们所知，我们是第一个在CSE-CIC-IDS2018数据集中使用集成特征选择技术的公司。开发有效的入侵检测模型的实用信息。据我们所知，我们是第一个在CSE-CIC-IDS2018数据集中使用集成特征选择技术的公司。

更新日期：2021-02-23

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文