当前位置: X-MOL 学术Comput. Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Catch them alive: A malware detection approach through memory forensics, manifold learning and computer vision
Computers & Security ( IF 5.6 ) Pub Date : 2021-01-02 , DOI: 10.1016/j.cose.2020.102166
Ahmet Selman Bozkir , Ersan Tahillioglu , Murat Aydos , Ilker Kara

The everlasting increase in usage of information systems and online services have triggered the birth of the new type of malware which are more dangerous and hard to detect. In particular, according to the recent reports, the new type of fileless malware infect the victims’ devices without a persistent trace (i.e. file) on hard drives. Moreover, existing static malware detection methods in literature often fail to detect sophisticated malware utilizing various obfuscation and encryption techniques. Our contribution in this study is two-folded. First, we present a novel approach to recognize malware by capturing the memory dump of suspicious processes which can be represented as a RGB image. In contrast to the conventional approaches followed by static and dynamic methods existing in the literature, we aimed to obtain and use memory data to reveal visual patterns that can be classified by employing computer vision and machine learning methods in a multi-class open-set recognition regime. And second, we have applied a state of art manifold learning scheme named UMAP to improve the detection of unknown malware files through binary classification. Throughout the study, we have employed our novel dataset covering 4294 samples in total, including 10 malware families along with the benign executables. Lastly, we obtained their memory dumps and converted them to RGB images by applying 3 different rendering schemes. In order to generate their signatures (i.e. feature vectors), we utilized GIST and HOG (Histogram of Gradients) descriptors as well as their combination. Moreover, the obtained signatures were classified via machine learning algorithms of j48, RBF kernel-based SMO, Random Forest, XGBoost and linear SVM. According to the results of the first phase, we have achieved prediction accuracy up to 96.39% by employing SMO algorithm on the feature vectors combined with GIST+HOG. Besides, the UMAP based manifold learning strategy has improved accuracy of the unknown malware recognition models up to 12.93%, 21.83%, 20.78% on average for Random Forest, linear SVM and XGBoost algorithms respectively. Moreover, on a commercially available standard desktop computer, the suggested approach takes only 3.56 s for analysis on average. The results show that our vision based scheme provides an effective protection mechanism against malicious applications.



中文翻译:

抓住它们:通过内存取证,多种学习和计算机视觉的恶意软件检测方法

信息系统和在线服务使用的持续增长引发了新型恶意软件的诞生,这种新型恶意软件更加危险且难以检测。特别是,根据最近的报告,新型无文件恶意软件会感染受害者的设备,而不会在硬盘驱动器上留下持久的痕迹(即文件)。此外,文献中现有的静态恶意软件检测方法通常无法利用各种混淆和加密技术来检测复杂的恶意软件。我们在这项研究中的贡献有两个方面。首先,我们提出一种新颖的方法来识别恶意软件,方法是捕获可表示为RGB图像的可疑进程的内存转储。与文献中现有的静态和动态方法所采用的常规方法不同,我们旨在获取和使用内存数据来揭示可通过在多类开放式识别机制中采用计算机视觉和机器学习方法进行分类的视觉模式。其次,我们应用了一种称为UMAP的先进的流形学习方案,以通过二进制分类来改进对未知恶意软件文件的检测。在整个研究过程中,我们采用了新颖的数据集,涵盖总共4294个样本,包括10个恶意软件家族以及良性可执行文件。最后,我们获得了它们的内存转储,并通过应用3种不同的渲染方案将它们转换为RGB图像。为了生成它们的签名(即特征向量),我们利用了GIST和HOG(梯度直方图)描述符以及它们的组合。此外,通过j48,基于RBF内核的SMO,Random Forest,XGBoost和线性SVM的机器学习算法对获得的签名进行分类。根据第一阶段的结果,通过对GIST + HOG与特征向量结合使用SMO算法,我们达到了高达96.39%的预测精度。此外,基于UMAP的流形学习策略将随机森林,线性SVM和XGBoost算法的未知恶意软件识别模型的准确性分别提高了12.93%,21.83%,20.78%。此外,在市售的标准台式计算机上,建议的方法平均只需3.56 s进行分析。结果表明,我们基于视觉的方案为恶意应用程序提供了有效的保护机制。根据第一阶段的结果,通过对GIST + HOG与特征向量结合使用SMO算法,我们达到了高达96.39%的预测精度。此外,基于UMAP的流形学习策略将随机森林,线性SVM和XGBoost算法的未知恶意软件识别模型的准确性分别提高了12.93%,21.83%,20.78%。此外,在市售的标准台式计算机上,建议的方法平均只需3.56 s进行分析。结果表明,我们基于视觉的方案为恶意应用程序提供了有效的保护机制。根据第一阶段的结果,通过对GIST + HOG与特征向量结合使用SMO算法,我们达到了高达96.39%的预测精度。此外,基于UMAP的流形学习策略将随机森林,线性SVM和XGBoost算法的未知恶意软件识别模型的准确性分别提高了12.93%,21.83%,20.78%。此外,在市售的标准台式计算机上,建议的方法平均只需3.56 s进行分析。结果表明,我们基于视觉的方案为恶意应用程序提供了有效的保护机制。基于UMAP的流形学习策略将随机森林算法,线性SVM算法和XGBoost算法的未知恶意软件识别模型的准确性分别提高了12.93%,21.83%,20.78%。此外,在市售的标准台式计算机上,建议的方法平均只需3.56 s进行分析。结果表明,我们基于视觉的方案为恶意应用程序提供了有效的保护机制。基于UMAP的流形学习策略将随机森林算法,线性SVM算法和XGBoost算法的未知恶意软件识别模型的准确性分别提高了12.93%,21.83%,20.78%。此外,在市售的标准台式计算机上,建议的方法平均仅需要3.56 s进行分析。结果表明,我们基于视觉的方案为恶意应用程序提供了有效的保护机制。

更新日期:2021-01-14
down
wechat
bug