Malware Classification with Word Embedding Features,arXiv - CS - Cryptography and Security

当前位置： X-MOL 学术 › arXiv.cs.CR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Malware Classification with Word Embedding Features
arXiv - CS - Cryptography and Security Pub Date : 2021-03-03 , DOI: arxiv-2103.02711
Aparna Sunil Kale, Fabio Di Troia, Mark Stamp

Malware classification is an important and challenging problem in information security. Modern malware classification techniques rely on machine learning models that can be trained on features such as opcode sequences, API calls, and byte $n$-grams, among many others. In this research, we consider opcode features. We implement hybrid machine learning techniques, where we engineer feature vectors by training hidden Markov models -- a technique that we refer to as HMM2Vec -- and Word2Vec embeddings on these opcode sequences. The resulting HMM2Vec and Word2Vec embedding vectors are then used as features for classification algorithms. Specifically, we consider support vector machine (SVM), $k$-nearest neighbor ($k$-NN), random forest (RF), and convolutional neural network (CNN) classifiers. We conduct substantial experiments over a variety of malware families. Our experiments extend well beyond any previous work in this field.

中文翻译：

具有词嵌入功能的恶意软件分类

恶意软件分类是信息安全中一个重要且具有挑战性的问题。现代恶意软件分类技术依赖于机器学习模型，这些模型可以在诸如操作码序列，API调用和字节$ n $ -grams等功能上进行训练。在这项研究中，我们考虑了操作码功能。我们实现了混合机器学习技术，其中我们通过训练隐马尔可夫模型（我们称为HMM2Vec的技术）和这些操作码序列上的Word2Vec嵌入来设计特征向量。然后将所得的HMM2Vec和Word2Vec嵌入向量用作分类算法的特征。具体来说，我们考虑支持向量机（SVM），$ k $最近邻（$ k $ -NN），随机森林（RF）和卷积神经网络（CNN）分类器。我们对各种恶意软件家族进行了大量实验。我们的实验远远超出了该领域以前的任何工作。

更新日期：2021-03-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>