当前位置: X-MOL 学术Int. J. Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Focus: Function clone identification on cross-platform
International Journal of Intelligent Systems ( IF 5.0 ) Pub Date : 2021-11-23 , DOI: 10.1002/int.22752
Lirong Fu 1 , Shouling Ji 1 , Changchang Liu 2 , Peiyu Liu 1 , Fuzheng Duan 1 , Zonghui Wang 1 , Whenzhi Chen 1 , Ting Wang 3
Affiliation  

Automatic identification of function clones on cross-platform aims at determining whether two functions are identical or not without access to the source code, which is a fundamental challenge in vulnerability search, code plagiarism detection, and malware classification. With the rapid development of deep neural network in program analysis, the state-of-the-art neural network-based function clone identification methods propose to represent functions as embeddings by graph neural network (GNN). However, such a novel representation of functions brings in two challenges. (1) The feature engineering that accurately maps the raw data of binary code to machine learning features is complicated. (2) A highly accurate embedding of functions requires a customized GNN to focus on the most critical features to identify binary code. To the best of our knowledge, currently, a comprehensive work that can overcome the above challenges is still missing. In this paper, we propose a novel prototype named as Focus, which is designed to accurately and efficiently identify similar functions. Specifically, inspired by natural language processing techniques which effectively learns text semantic across natural languages, Focus can learn representative semantic features of functions by a customized learning model. To address the second challenge, a multi-head attention mechanism can be employed to capture the critical features of a function. Through extensive experiments, we demonstrate that Focus achieves high accuracy of function clone identification on a broad range of eight architectures. In particular, the identification performance (AUC value) of Focus is 97% and 99% for cross-platform and single-platform, respectively. Furthermore, the evaluation in real world applications shows that our Focus identifies 24 vulnerable functions among the top-30 candidates, which is one time higher than the baseline approaches.

中文翻译:

重点:跨平台的函数克隆识别

跨平台函数克隆的自动识别旨在在不访问源代码的情况下确定两个函数是否相同,这是漏洞搜索、代码抄袭检测和恶意软件分类的基本挑战。随着深度神经网络在程序分析中的快速发展,最先进的基于神经网络的函数克隆识别方法提出通过图神经网络(GNN)将函数表示为嵌入。然而,这种新颖的功能表示带来了两个挑战。(1) 将二进制代码的原始数据准确映射到机器学习特征的特征工程很复杂。(2) 高度准确的函数嵌入需要定制的 GNN 来专注于识别二进制代码的最关键特征。据我们所知,目前仍然缺少能够克服上述挑战的综合性工作。在本文中,我们提出了一个新的原型,命名为Focus,旨在准确有效地识别相似功能。具体来说,受自然语言处理技术的启发,该技术可以有效地学习跨自然语言的文本语义,Focus可以通过定制的学习模型来学习函数的代表性语义特征。为了解决第二个挑战,可以采用多头注意力机制来捕获函数的关键特征。通过大量实验,我们证明Focus在广泛的八种架构上实现了高精度的功能克隆识别。特别是Focus的识别性能(AUC值)跨平台和单平台分别为 97% 和 99%。此外,在实际应用中的评估表明,我们的Focus在前 30 名候选者中识别出 24 个易受攻击的函数,比基线方法高出一倍。
更新日期:2021-11-23
down
wechat
bug