当前位置: X-MOL 学术IEEE Trans. Dependable Secure Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep Learning and Visualization for Identifying Malware Families
IEEE Transactions on Dependable and Secure Computing ( IF 7.3 ) Pub Date : 2021-01-01 , DOI: 10.1109/tdsc.2018.2884928
Guosong Sun , Quan Qian

The growing threat of malware is becoming more and more difficult to ignore. In this paper, a malware feature images generation method is used to combine the static analysis of malicious code with the methods of recurrent neural networks (RNN) and convolutional neural networks (CNN). By using an RNN, our method considers not only the original information of malware but also the ability to associate the original code with timing characteristics; furthermore, the process reduces the dependence on category labels of malware. Then, we use minhash to generate feature images from the fusion of the original codes and the predictive codes from the RNN. Finally, we train a CNN to classify feature images. When we trained very few samples (the proportion of the sample size of training dataset to validation dataset was 1:30), we obtained accuracy over 92 percent. When we adjust the proportion to 3:1, the accuracy exceeds 99.5 percent. As shown in confusion matrices, our method obtains a good result, where the worst false positive rate of all the malware families is 0.0147 and the average false positive rate is 0.0058.

中文翻译:

用于识别恶意软件家族的深度学习和可视化

恶意软件日益增长的威胁变得越来越难以忽视。在本文中,一种恶意软件特征图像生成方法被用于将恶意代码的静态分析与循环神经网络(RNN)和卷积神经网络(CNN)的方法相结合。通过使用RNN,我们的方法不仅考虑了恶意软件的原始信息,还考虑了将原始代码与时序特征相关联的能力;此外,该过程减少了对恶意软件类别标签的依赖。然后,我们使用 minhash 从原始代码和来自 RNN 的预测代码的融合中生成特征图像。最后,我们训练一个 CNN 来对特征图像进行分类。当我们训练的样本很少时(训练数据集的样本量与验证数据集的比例为1:30),我们获得了超过 92% 的准确率。当我们将比例调整为 3:1 时,准确率超过 99.5%。如混淆矩阵所示,我们的方法获得了良好的结果,其中所有恶意软件家族的最差误报率为 0.0147,平均误报率为 0.0058。
更新日期:2021-01-01
down
wechat
bug