当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Resampling imbalanced data for network intrusion detection datasets
Journal of Big Data ( IF 8.1 ) Pub Date : 2021-01-06 , DOI: 10.1186/s40537-020-00390-x
Sikha Bagui , Kunqi Li

Machine learning plays an increasingly significant role in the building of Network Intrusion Detection Systems. However, machine learning models trained with imbalanced cybersecurity data cannot recognize minority data, hence attacks, effectively. One way to address this issue is to use resampling, which adjusts the ratio between the different classes, making the data more balanced. This research looks at resampling’s influence on the performance of Artificial Neural Network multi-class classifiers. The resampling methods, random undersampling, random oversampling, random undersampling and random oversampling, random undersampling with Synthetic Minority Oversampling Technique, and random undersampling with Adaptive Synthetic Sampling Method were used on benchmark Cybersecurity datasets, KDD99, UNSW-NB15, UNSW-NB17 and UNSW-NB18. Macro precision, macro recall, macro F1-score were used to evaluate the results. The patterns found were: First, oversampling increases the training time and undersampling decreases the training time; second, if the data is extremely imbalanced, both oversampling and undersampling increase recall significantly; third, if the data is not extremely imbalanced, resampling will not have much of an impact; fourth, with resampling, mostly oversampling, more of the minority data (attacks) were detected.



中文翻译:

为网络入侵检测数据集重新采样不平衡数据

机器学习在网络入侵检测系统的构建中扮演着越来越重要的角色。但是,使用不平衡的网络安全数据训练的机器学习模型无法有效识别少数数据,因此无法有效地进行攻击。解决此问题的一种方法是使用重采样,它可以调整不同类之间的比率,从而使数据更加平衡。本研究着眼于重采样对人工神经网络多分类器性能的影响。在基准网络安全数据集,KDD99,UNSW-NB15,UNSW-NB17和UNSW上使用了重采样方法,随机欠采样,随机过采样,随机过采样和随机过采样,采用合成少数过采样技术进行随机欠采样以及采用自适应综合采样方法进行随机欠采样。 -NB18。宏精度 宏调用,宏F1得分用于评估结果。发现的模式是:首先,过采样会增加训练时间,而欠采样会减少训练时间。第二,如果数据极不平衡,则过采样和欠采样都会大大增加召回率;第三,如果数据不是非常不平衡,则重采样不会有太大影响;第四,通过重新采样(主要是过度采样),可以检测到更多的少数数据(攻击)。重采样不会产生太大影响;第四,通过重新采样(主要是过度采样),可以检测到更多的少数数据(攻击)。重采样不会产生太大影响;第四,通过重新采样(主要是过度采样),可以检测到更多的少数数据(攻击)。

更新日期:2021-01-07
down
wechat
bug