Detecting web attacks using random undersampling and ensemble learners,Journal of Big Data

当前位置： X-MOL 学术 › J. Big Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Detecting web attacks using random undersampling and ensemble learners
Journal of Big Data ( IF 8.1 ) Pub Date : 2021-05-27 , DOI: 10.1186/s40537-021-00460-8
Richard Zuech , John Hancock , Taghi M. Khoshgoftaar

Class imbalance is an important consideration for cybersecurity and machine learning. We explore classification performance in detecting web attacks in the recent CSE-CIC-IDS2018 dataset. This study considers a total of eight random undersampling (RUS) ratios: no sampling, 999:1, 99:1, 95:5, 9:1, 3:1, 65:35, and 1:1. Additionally, seven different classifiers are employed: Decision Tree (DT), Random Forest (RF), CatBoost (CB), LightGBM (LGB), XGBoost (XGB), Naive Bayes (NB), and Logistic Regression (LR). For classification performance metrics, Area Under the Receiver Operating Characteristic Curve (AUC) and Area Under the Precision-Recall Curve (AUPRC) are both utilized to answer the following three research questions. The first question asks: “Are various random undersampling ratios statistically different from each other in detecting web attacks?” The second question asks: “Are different classifiers statistically different from each other in detecting web attacks?” And, our third question asks: “Is the interaction between different classifiers and random undersampling ratios significant for detecting web attacks?” Based on our experiments, the answers to all three research questions is “Yes”. To the best of our knowledge, we are the first to apply random undersampling techniques to web attacks from the CSE-CIC-IDS2018 dataset while exploring various sampling ratios.

中文翻译：

使用随机欠采样和集成学习器检测网络攻击

类别不平衡是网络安全和机器学习的重要考虑因素。我们探索了在最近的 CSE-CIC-IDS2018 数据集中检测 Web 攻击的分类性能。本研究总共考虑了八种随机欠采样 (RUS) 比率：无采样、999:1、99:1、95:5、9:1、3:1、65:35 和 1:1。此外，还采用了七种不同的分类器：决策树 (DT)、随机森林 (RF)、CatBoost (CB)、LightGBM (LGB)、XGBoost (XGB)、朴素贝叶斯 (NB) 和逻辑回归 (LR)。对于分类性能指标，接收器操作特征曲线下的面积 (AUC) 和精确召回曲线下的面积 (AUPRC) 均用于回答以下三个研究问题。第一个问题问：“在检测网络攻击时，各种随机欠采样率在统计上是否不同？” 第二个问题是：“不同的分类器在检测 Web 攻击时在统计上彼此不同吗？” 而且，我们的第三个问题是：“不同分类器和随机欠采样率之间的相互作用对于检测 Web 攻击是否重要？” 根据我们的实验，所有三个研究问题的答案都是“是”。据我们所知，我们是第一个将随机欠采样技术应用于来自 CSE-CIC-IDS2018 数据集的网络攻击，同时探索各种采样率的人。“不同分类器和随机欠采样率之间的相互作用对于检测网络攻击是否重要？” 根据我们的实验，所有三个研究问题的答案均为“是”。据我们所知，我们是第一个将随机欠采样技术应用于来自 CSE-CIC-IDS2018 数据集的网络攻击，同时探索各种采样率的人。“不同分类器和随机欠采样率之间的相互作用对于检测网络攻击是否重要？” 根据我们的实验，所有三个研究问题的答案都是“是”。据我们所知，我们是第一个将随机欠采样技术应用于来自 CSE-CIC-IDS2018 数据集的网络攻击，同时探索各种采样率的人。

更新日期：2021-05-28

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>