MalFamAware: automatic family identification and malware classification through online clustering,International Journal of Information Security

当前位置： X-MOL 学术 › Int. J. Inf. Secur. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MalFamAware: automatic family identification and malware classification through online clustering
International Journal of Information Security ( IF 2.4 ) Pub Date : 2020-06-16 , DOI: 10.1007/s10207-020-00509-4
Gregorio Pitolli , Giuseppe Laurenza , Leonardo Aniello , Leonardo Querzoni , Roberto Baldoni

The skyrocketing growth rate of new malware brings novel challenges to protect computers and networks. Discerning truly novel malware from variants of known samples is a way to keep pace with this trend. This can be done by grouping known malware in families by similarity and classifying new samples into those families. As malware and their families evolve over time, approaches based on classifiers trained on a fixed ground truth are not suitable. Other techniques use clustering to identify families, but they need to periodically re-cluster the whole set of samples, which does not scale well. A promising approach is based on incremental clustering, where periodically only yet unknown samples are clustered to identify new families, and classifiers are retrained accordingly. However, the latter solutions usually are not able to immediately react and identify new malware families. In this paper, we propose MalFamAware, a novel approach to malware family identification based on an online clustering algorithm, namely BIRCH, which efficiently updates clusters as new samples are fed without requiring to re-scan the entire dataset. MalFamAwareis able to both classify new malware in existing families and identify new families at runtime. We present experimental evaluations where MalFamAware outperforms both total re-clustering and incremental clustering solutions in terms of accuracy and time. We also compare our solution with classifiers retrained over time, obtaining better accuracy, in particular when samples belong to yet unknown families.

中文翻译：

MalFamAware：通过在线群集自动进行家庭识别和恶意软件分类

新恶意软件的飞速增长为保护计算机和网络带来了新的挑战。从已知样本的变体中辨别出真正新颖的恶意软件，是跟上这一趋势的一种方式。这可以通过按相似性将家族中的已知恶意软件分组并将新样本分类到这些家族中来完成。随着恶意软件及其家族随着时间的推移而发展，基于固定基础事实训练的分类器的方法是不合适的。其他技术使用聚类来识别家族，但是它们需要定期重新聚类整个样本集，但扩展性不好。一种有前途的方法是基于增量聚类，其中定期仅将未知样本聚类以识别新的族，然后对分类器进行相应的重新训练。然而，后者的解决方案通常无法立即做出反应并识别新的恶意软件家族。在本文中，我们提出了MalFamAware，这是一种基于在线聚类算法BIRCH的恶意软件家族识别的新方法，该算法可在输入新样本时有效地更新聚类，而无需重新扫描整个数据集。MalFamAware能够对现有家族中的新恶意软件进行分类，并能够在运行时识别新家族。我们提供的实验评估结果显示，MalFamAware在准确性和时间方面均胜过总重聚和增量聚类解决方案。我们还将解决方案与经过时间重新训练的分类器进行比较，从而获得更好的准确性，尤其是当样本属于未知家族时。一种基于在线聚类算法（即BIRCH）的恶意软件家族识别的新颖方法，该方法可在输入新样本时有效地更新聚类，而无需重新扫描整个数据集。MalFamAware能够对现有家族中的新恶意软件进行分类，并能够在运行时识别新家族。我们提供的实验评估结果显示，MalFamAware在准确性和时间方面均胜过总重聚和增量聚类解决方案。我们还将解决方案与经过时间重新训练的分类器进行比较，从而获得更好的准确性，尤其是当样本属于未知家族时。一种基于在线聚类算法（即BIRCH）的恶意软件家族识别的新颖方法，该方法可在输入新样本时有效地更新聚类，而无需重新扫描整个数据集。MalFamAware能够对现有家族中的新恶意软件进行分类，并能够在运行时识别新家族。我们提供的实验评估结果显示，MalFamAware在准确性和时间方面均胜过总重聚和增量聚类解决方案。我们还将解决方案与经过时间重新训练的分类器进行比较，从而获得更好的准确性，尤其是当样本属于未知家族时。MalFamAware能够对现有家族中的新恶意软件进行分类，并能够在运行时识别新家族。我们提供的实验评估结果显示，MalFamAware在准确性和时间方面均胜过总重聚和增量聚类解决方案。我们还将解决方案与经过时间重新训练的分类器进行比较，从而获得更好的准确性，尤其是当样本属于未知家族时。MalFamAware能够对现有家族中的新恶意软件进行分类，并能够在运行时识别新家族。我们提供的实验评估结果显示，MalFamAware在准确性和时间方面均胜过总重聚和增量聚类解决方案。我们还将解决方案与经过时间重新训练的分类器进行比较，从而获得更好的准确性，尤其是当样本属于未知家族时。

更新日期：2020-06-16

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11