当前位置: X-MOL 学术Computing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Combined oversampling and undersampling method based on slow-start algorithm for imbalanced network traffic
Computing ( IF 3.3 ) Pub Date : 2020-10-21 , DOI: 10.1007/s00607-020-00854-1
Seunghyun Park , Hyunhee Park

Network traffic data basically comprise a major amount of normal traffic data and a minor amount of attack data. Such an imbalance problem in the amounts of the two types of data reduces prediction performance, such as by prediction bias of the minority data and miscalculation of normal data as outliers. To address the imbalance problem, representative sampling methods include various minority data synthesis models based on oversampling. However, as the oversampling method for resolving the imbalance problem involves repeatedly learning the same data, the classification model can overfit the learning data. Meanwhile, the undersampling methods proposed to address the imbalance problem can cause information loss because they remove data. To improve the performance of these oversampling and undersampling approaches, we propose an oversampling ensemble method based on the slow-start algorithm. The proposed combined oversampling and undersampling method based on the slow-start (COUSS) algorithm is based on the congestion control algorithm of the transmission control protocol. Therefore, an imbalanced dataset oversamples until overfitting occurs, based on a minimally applied undersampling dataset. The simulation results obtained using the KDD99 dataset show that the proposed COUSS method improves the F1 score by 8.639%, 6.858%, 5.003%, and 4.074% compared to synthetic minority oversampling technique (SMOTE), borderline-SMOTE, adaptive synthetic sampling, and generative adversarial network oversampling algorithms, respectively. Therefore, the COUSS method can be perceived as a practical solution in data analysis applications.

中文翻译:

基于慢启动算法的过采样与欠采样相结合的网络流量不平衡算法

网络流量数据基本上包括大量的正常流量数据和少量的攻击数据。这种两类数据量的不平衡问题降低了预测性能,例如少数数据的预测偏差和将正常数据误算为异常值。为了解决不平衡问题,代表性的采样方法包括基于过采样的各种少数数据合成模型。然而,由于解决不平衡问题的过采样方法涉及重复学习相同的数据,分类模型可能会过拟合学习数据。同时,为解决不平衡问题而提出的欠采样方法可能会导致信息丢失,因为它们会删除数据。为了提高这些过采样和欠采样方法的性能,我们提出了一种基于慢启动算法的过采样集成方法。所提出的基于慢启动(COUSS)算法的过采样和欠采样相结合的方法是基于传输控制协议的拥塞控制算法。因此,基于最小应用的欠采样数据集,不平衡的数据集会过采样直到发生过拟合。使用 KDD99 数据集获得的模拟结果表明,与合成少数过采样技术 (SMOTE)、边界线 SMOTE、自适应合成采样和自适应合成采样相比,所提出的 COUSS 方法将 F1 得分提高了 8.639%、6.858%、5.003% 和 4.074%。分别是生成对抗网络过采样算法。因此,可以将 COUSS 方法视为数据分析应用中的实用解决方案。
更新日期:2020-10-21
down
wechat
bug