当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semi-supervised two-phase familial analysis of Android malware with normalized graph embedding
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2021-02-13 , DOI: 10.1016/j.knosys.2021.106802
Qian Li , Qingyuan Hu , Yong Qi , Saiyu Qi , Xinxing Liu , Pengfei Gao

With the widespread use of smartphones, Android malware has posed serious threats to its security. Given the explosive growth of Android malware variants, detecting malware families are crucial for identifying new security threats, triaging, and building reference datasets. Building behavior profiles of Android applications (apps) with holistic graph-based features would help to retain program semantics and resist obfuscation. It is more effective to use representation with the low-dimensional feature, which could reduce calculation cost and improve the efficiency of downstream analytics tasks. To achieve this goal, we design and develop a practical system for the familial analysis of Android malware named GSFDroid. We first use graph-based features that contain structural information to analyze app behavior. Then, we employ Graph Convolutional Networks (GCNs) to embed nodes into a continuous and low-dimensional space, which improves the efficiency of downstream analytics tasks. Note that distributions of the learned feature vectors of APKs are not aligned and centered caused by the random initialization and propagation strategy of GCN, whose different scales can harm the performance of downstream tasks. Inspired by the z-score, we propose a simple graph feature normalization to standardize the embedded APK features. Finally, instead of fully supervised or unsupervised learning, we propose a two-phased familial analysis method fusing a semi-supervised classifier with a cluster operation on high uncertain score samples respect to the classifier. Promising experimental results based on real-world datasets demonstrate that our approach significantly outperforms state-of-the-art approaches, and can effectively cluster new malware samples from unknown families.



中文翻译:

具有规范化图嵌入功能的Android恶意软件的半监督式两阶段家族分析

随着智能手机的广泛使用,Android恶意软件对其安全性构成了严重威胁。鉴于Android恶意软件变体的爆炸式增长,检测恶意软件家族对于识别新的安全威胁,分类和构建参考数据集至关重要。使用基于整体图的功能构建Android应用程序(应用程序)的行为配置文件将有助于保留程序语义并抵抗混淆。使用具有低维特征的表示更为有效,这可以减少计算成本并提高下游分析任务的效率。为了实现此目标,我们设计并开发了一个实用的系统,用于对名为GSFDroid的Android恶意软件进行家族分析。我们首先使用包含结构信息的基于图的功能来分析应用行为。然后,我们使用图卷积网络(GCN)将节点嵌入到连续的低维空间中,从而提高了下游分析任务的效率。请注意,APK的学习特征向量的分布未对齐和居中,这是由GCN的随机初始化和传播策略引起的,GCN的不同规模可能损害下游任务的性能。受启发ž--sCØ[RË,我们提出了一种简单的图形特征归一化方法来标准化嵌入式APK功能。最后,代替完全监督或无监督学习,我们提出了一种两阶段的家族分析方法,该方法将半监督分类器与针对分类器的不确定性高的样本进行聚类运算。基于现实世界数据集的有希望的实验结果表明,我们的方法明显优于最新方法,并且可以有效地聚类来自未知家族的新恶意软件样本。

更新日期:2021-02-19
down
wechat
bug