当前位置: X-MOL 学术arXiv.cs.CR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks
arXiv - CS - Cryptography and Security Pub Date : 2021-07-30 , DOI: arxiv-2107.14776
Alberto Mozo, Ángel González-Prieto, Antonio Pastor, Sandra Gómez-Canaval, Edgar Talavera

Due to the growing rise of cyber attacks in the Internet, flow-based data sets are crucial to increase the performance of the Machine Learning (ML) components that run in network-based intrusion detection systems (IDS). To overcome the existing network traffic data shortage in attack analysis, recent works propose Generative Adversarial Networks (GANs) for synthetic flow-based network traffic generation. Data privacy is appearing more and more as a strong requirement when processing such network data, which suggests to find solutions where synthetic data can fully replace real data. Because of the ill-convergence of the GAN training, none of the existing solutions can generate high-quality fully synthetic data that can totally substitute real data in the training of IDS ML components. Therefore, they mix real with synthetic data, which acts only as data augmentation components, leading to privacy breaches as real data is used. In sharp contrast, in this work we propose a novel deterministic way to measure the quality of the synthetic data produced by a GAN both with respect to the real data and to its performance when used for ML tasks. As a byproduct, we present a heuristic that uses these metrics for selecting the best performing generator during GAN training, leading to a stopping criterion. An additional heuristic is proposed to select the best performing GANs when different types of synthetic data are to be used in the same ML task. We demonstrate the adequacy of our proposal by generating synthetic cryptomining attack traffic and normal traffic flow-based data using an enhanced version of a Wasserstein GAN. We show that the generated synthetic network traffic can completely replace real data when training a ML-based cryptomining detector, obtaining similar performance and avoiding privacy violations, since real data is not used in the training of the ML-based detector.

中文翻译:

通过生成对抗网络生成基于合成流的密码挖掘攻击

由于 Internet 中网络攻击的日益增多,基于流的数据集对于提高在基于网络的入侵检测系统 (IDS) 中运行的机器学习 (ML) 组件的性能至关重要。为了克服攻击分析中现有的网络流量数据不足,最近的工作提出了生成对抗网络(GAN)来生成基于合成流的网络流量。在处理此类网络数据时,数据隐私越来越成为一项强烈要求,这表明要找到合成数据可以完全替代真实数据的解决方案。由于 GAN 训练的不收敛性,现有的解决方案都不能生成高质量的全合成数据,在 IDS ML 组件的训练中可以完全替代真实数据。因此,他们将真实数据与合成数据混合在一起,它仅充当数据增强组件,在使用真实数据时会导致隐私泄露。与此形成鲜明对比的是,在这项工作中,我们提出了一种新的确定性方法来衡量 GAN 生成的合成数据的质量,包括真实数据及其用于 ML 任务时的性能。作为副产品,我们提出了一种启发式方法,它使用这些指标在 GAN 训练期间选择性能最佳的生成器,从而产生停止标准。当在同一 ML 任务中使用不同类型的合成数据时,提出了一种额外的启发式方法来选择性能最佳的 GAN。我们通过使用 Wasserstein GAN 的增强版本生成合成加密攻击流量和基于正常流量的数据来证明我们的提议的充分性。
更新日期:2021-08-02
down
wechat
bug