当前位置: X-MOL 学术IEEE Trans. Softw. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence Checking
IEEE Transactions on Software Engineering ( IF 6.5 ) Pub Date : 2022-02-08 , DOI: 10.1109/tse.2022.3149240
Huaijin Wang 1 , Pingchuan Ma 1 , Yuanyuan Yuan 1 , Zhibo Liu 1 , Shuai Wang 1 , Qiyi Tang 2 , Sen Nie 2 , Shi Wu 2
Affiliation  

Binary code function search has been used as the core basis of various security and software engineering applications, including malware clustering, code clone detection, and vulnerability audits. Recognizing logically similar assembly functions, however, remains a challenge. Most binary code search tools rely on program structure-level information, such as control flow and data flow graphs, that is extracted using program analysis techniques or deep neural networks (DNNs). However, DNN-based techniques capture lexical-, control structure-, or data flow-level information of binary code for representation learning, which is often too coarse-grained and does not accurately denote program functionality. Additionally, it may exhibit low robustness to a variety of challenging settings, such as compiler optimizations and obfuscations. This paper proposes a general solution for enhancing the top-$k$ ranked candidates in DNN-based binary code function search. The key idea is to design a low-cost and comprehensive equivalence check that quickly exposes functionality deviations between the target function and its top-$k$ matched functions. Functions that fail this equivalence check can be shaved from the top-$k$ list, and functions that pass the check can be revisited to move ahead on the top-$k$ ranked candidates, in a deliberate way. We design a practical and efficient equivalence check, named BinUSE, using under-constrained symbolic execution (USE). USE, a variant of symbolic execution, improves scalability by initiating symbolic execution directly from function entry points and relaxing constraints on function parameters. It eliminates the overhead incurred by path explosion and costly constraints. BinUSE is specifically designed to deliver an assembly function-level equivalence check, enhancing DNN-based binary code search by reducing its false alarms with low cost. Our evaluation shows that BinUSE can enable a general and effective enhancement of four state-of-the-art DNN-based binary code search tools when confronted with challenges posed by different compilers, optimizations, obfuscations, and architectures.

中文翻译:


通过低成本等价检查增强基于 DNN 的二进制代码函数搜索



二进制代码函数搜索已被用作各种安全和软件工程应用的核心基础,包括恶意软件聚类、代码克隆检测和漏洞审计。然而,识别逻辑上相似的装配功能仍然是一个挑战。大多数二进制代码搜索工具依赖于使用程序分析技术或深度神经网络 (DNN) 提取的程序结构级信息,例如控制流和数据流图。然而,基于 DNN 的技术捕获用于表示学习的二进制代码的词汇、控制结构或数据流级信息,这些信息通常过于粗粒度,并且不能准确地表示程序功能。此外,它可能对各种具有挑战性的设置表现出较低的鲁棒性,例如编译器优化和混淆。本文提出了一种通用解决方案,用于增强基于 DNN 的二进制代码函数搜索中排名前 $k$ 的候选者。关键思想是设计一种低成本且全面的等价性检查,快速暴露目标函数与其 top-$k$ 匹配函数之间的功能偏差。未通过等效性检查的函数可以从 top-$k$ 列表中剔除,并且可以重新审视通过检查的函数,以故意的方式在排名前 $k$ 的候选函数中前进。我们使用欠约束符号执行(USE)设计了一种实用且高效的等价检查,名为 BinUSE。 USE 是符号执行的一种变体,它通过直接从函数入口点启动符号执行并放松对函数参数的约束来提高可伸缩性。它消除了路径爆炸和昂贵的约束所带来的开销。 BinUSE 专门设计用于提供汇编功能级等效性检查,通过以低成本减少误报来增强基于 DNN 的二进制代码搜索。我们的评估表明,当面临不同编译器、优化、混淆和架构带来的挑战时,BinUSE 可以全面有效地增强四种最先进的基于 DNN 的二进制代码搜索工具。
更新日期:2022-02-08
down
wechat
bug