CTDroid: Leveraging a Corpus of Technical Blogs for Android Malware Analysis,IEEE Transactions on Reliability

当前位置： X-MOL 学术 › IEEE Trans. Reliab. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CTDroid: Leveraging a Corpus of Technical Blogs for Android Malware Analysis
IEEE Transactions on Reliability ( IF 5.9 ) Pub Date : 2020-03-01 , DOI: 10.1109/tr.2019.2926129
Ming Fan , Xiapu Luo , Jun Liu , Chunyin Nong , Qinghua Zheng , Ting Liu

The rapid growth of Android malware results in a large body of approaches devoted to malware analysis by leveraging machine learning algorithms. However, the effectiveness of these approaches primarily depends on the manual feature engineering process, which is time-consuming and labor-intensive based on expert knowledge and intuition. In this paper, we propose an automatic approach that engineers informative features from a corpus of Android malware related technical blogs, which are written in a way that mirrors the human feature engineering process. However, there are two main challenges. First, it is difficult to recognize useful knowledge in the magnanimity information of thousands of blogs. To this end, we leverage natural language processing techniques to process the blogs and extract a set of sensitive behaviors that might do harmful activities to users potentially. Second, there exists a semantic gap between the extracted sensitive behaviors and the programming language. To this end, we propose two semantic matching rules to match the behaviors with concrete code snippets such that the apps can be tested experimentally. We design and implement a system called CTDroid for malware analysis, including malware detection (MD) and familial classification (FC). After the evaluation of CTDroid on a large scale of real malware and benign apps, the experimental results demonstrate that CTDroid can achieve 95.8% true positive rate with only 1% false positive rate for MD and 97.9% accuracy for FC. Furthermore, our proposed features are more informative than those of state-of-the-art approaches.

中文翻译：

CTDroid：利用技术博客语料库进行 Android 恶意软件分析

Android 恶意软件的快速增长催生了大量利用机器学习算法进行恶意软件分析的方法。然而，这些方法的有效性主要取决于手动特征工程过程，这是基于专家知识和直觉的费时费力。在本文中，我们提出了一种自动方法，该方法可以从与 Android 恶意软件相关的技术博客语料库中设计信息特征，这些技术博客的编写方式反映了人类特征工程过程。然而，有两个主要挑战。首先，在海量的博客信息中，很难识别有用的知识。为此，我们利用自然语言处理技术来处理博客并提取一组可能对用户进行有害活动的敏感行为。其次，提取的敏感行为与编程语言之间存在语义鸿沟。为此，我们提出了两个语义匹配规则来将行为与具体的代码片段进行匹配，以便可以对应用程序进行实验测试。我们设计并实现了一个名为 CTDroid 的系统，用于恶意软件分析，包括恶意软件检测 (MD) 和家族分类 (FC)。CTDroid 在大规模真实恶意软件和良性应用程序上进行评估后，实验结果表明 CTDroid 可以达到 95.8% 的真阳性率，MD 的假阳性率仅为 1%，FC 的准确率为 97.9%。此外，

更新日期：2020-03-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>