当前位置: X-MOL 学术Comput. Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Byte-level Malware Classification Based on Markov Images and Deep Learning
Computers & Security ( IF 5.6 ) Pub Date : 2020-05-01 , DOI: 10.1016/j.cose.2020.101740
Baoguo Yuan , Junfeng Wang , Dong Liu , Wen Guo , Peng Wu , Xuhua Bao

Abstract In recent years, malware attacks have become serious security threats and have caused huge losses. Due to the rapid growth of malware variants, how to quickly and accurately classify malware is critical to cyber security. As traditional methods based on machine learning are limited by feature engineering and difficult to process vast amounts of malware quickly, malware classification based on malware images and deep learning has become an effective solution. However, the accuracy rate of existing method based on gray images and deep learning (GDMC) still needs to be improved. Moreover, it is heavily dependent on the amount of training dataset. To improve the accuracy, this paper proposes a byte-level malware classification method based on markov images and deep learning referred to as MDMC. The main step in MDMC is converting malware binaries into markov images according to bytes transfer probability matrixs. Then the deep convolutional neural network is used for markov images classification. The experiments are conducted on two malware datasets, the Microsoft dataset and the Drebin dataset. The average accuracy rates of MDMC are respectively 99.264% and 97.364% on the two datasets. Further experiments on different proportions of training dataset and testing dataset also show that MDMC has better performance than GDMC.

中文翻译:

基于马尔可夫图像和深度学习的字节级恶意软件分类

摘要 近年来,恶意软件攻击已成为严重的安全威胁并造成巨大损失。由于恶意软件变种的快速增长,如何快速准确地对恶意软件进行分类对网络安全至关重要。由于传统的基于机器学习的方法受特征工程的限制,难以快速处理大量恶意软件,基于恶意软件图像和深度学习的恶意软件分类已成为一种有效的解决方案。然而,现有基于灰度图像和深度学习(GDMC)的方法的准确率仍有待提高。此外,它在很大程度上依赖于训练数据集的数量。为了提高准确率,本文提出了一种基于马尔可夫图像和深度学习的字节级恶意软件分类方法,简称MDMC。MDMC 的主要步骤是根据字节传输概率矩阵将恶意软件二进制文件转换为马尔可夫图像。然后使用深度卷积神经网络进行马尔可夫图像分类。实验在两个恶意软件数据集上进行,Microsoft 数据集和 Drebin 数据集。MDMC 在两个数据集上的平均准确率分别为 99.264% 和 97.364%。在不同比例的训练数据集和测试数据集上的进一步实验也表明,MDMC 比 GDMC 具有更好的性能。在两个数据集上分别为 264% 和 97.364%。在不同比例的训练数据集和测试数据集上的进一步实验也表明,MDMC 比 GDMC 具有更好的性能。在两个数据集上分别为 264% 和 97.364%。在不同比例的训练数据集和测试数据集上的进一步实验也表明,MDMC 比 GDMC 具有更好的性能。
更新日期:2020-05-01
down
wechat
bug