当前位置: X-MOL 学术Comput. Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A novel few-shot malware classification approach for unknown family recognition with multi-prototype modeling
Computers & Security ( IF 4.8 ) Pub Date : 2021-04-18 , DOI: 10.1016/j.cose.2021.102273
Peng Wang , Zhijie Tang , Junfeng Wang

New malware variants appear rapidly and continuously increase the difficulty to classify malware into correct families. This brings two challenges for malware classification: The first is the scarce samples problem, where collecting a large volume of a newly detected malware family to train a classifier can be extremely hard and it is unavoidable to suffer from overfitting using a small number of samples. The second is the dynamic recognition problem. Most widely adopted classifiers are trained on predefined known malware families, lacking ability to incrementally identifying novel families, which require to retrain from scratch. To tackle these challenges, in this study, we employ meta-learning based few-shot learning (FSL) technique and propose a new few-shot malware classification model called SIMPLE (Supervised Infinite Mixture Prototypes LEarning). With the help of meta-learning, SIMPLE is trained with predefined malware families and can maintain its ability to classify novel malware families that has never met. Furthermore, the prior knowledge learned via meta-learning can prevent from overfitting caused by scarce samples. Our proposed SIMPLE introduces multi-prototype modeling to generate multiple prototypes of each family to enhance the generalization ability, based on API invocation sequences from dynamic analysis. This is inspired by the observation that behaviors within the same family often match multiple subpatterns and satisfy multimodal data distribution. In the broad experiments, SIMPLE achieves state-of-the-art few-shot malware classification performance and outperforms all the baselines. With only 5 samples per family, SIMPLE reaches very high accuracy of 90% in 5-way classification task on novel malware families, which substantially solves the problem of scarce samples and dynamic recognition. We also make analysis on the reason of effectiveness with multi-prototype and fast adaption feature to provide more interpretability for the results.



中文翻译:

利用多原型建模对未知家庭进行识别的新颖的一次性恶意软件分类方法

新的恶意软件变种迅速出现,并不断增加将恶意软件分类为正确家族的难度。这给恶意软件分类带来了两个挑战:第一个是样本稀缺问题,在该问题中,收集大量新检测到的恶意软件家族以训练分类器可能非常困难,并且不可避免地会因使用少量样本而过度拟合。第二个是动态识别问题。最为广泛采用的分类器是针对预定义的已知恶意软件家族进行训练的,这些家族缺乏逐步识别新家族的能力,这些家族需要从头进行重新训练。为了应对这些挑战,在本研究中,我们采用基于元学习的少枪法学习(FSL)技术,并提出了一种称为SIMPLE(有监督的无限混合物原型学习)的新的枪法恶意软件分类模型。在元学习的帮助下,SIMPLE接受了预定义恶意软件家族的培训,并可以保持对从未遇到过的新型恶意软件家族进行分类的能力。此外,通过元学习获得的先验知识可以防止因稀缺样本而导致的过拟合。基于动态分析的API调用序列,我们提出的SIMPLE引入了多原型建模,以生成每个系列的多个原型以增强泛化能力。这是因为观察发现,同一家庭中的行为经常匹配多个子模式并满足多模式数据分布。在广泛的实验中 SIMPLE实现了最新的恶意软件分类性能,并且性能优于所有基准。每个家族只有5个样本,SIMPLE在新型恶意软件家族的5种分类任务中达到了90%的非常高的准确度,从根本上解决了样本稀少和动态识别的问题。我们还使用多原型和快速适应功能对有效性的原因进行了分析,以为结果提供更多的可解释性。

更新日期:2021-05-08
down
wechat
bug