Bin2vec: learning representations of binary executable programs for security tasks,Cybersecurity

当前位置： X-MOL 学术 › Cybersecurity › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Bin2vec: learning representations of binary executable programs for security tasks
Cybersecurity ( IF 3.9 ) Pub Date : 2021-07-01 , DOI: 10.1186/s42400-021-00088-4
Shushan Arakelyan , Sima Arasteh , Christophe Hauser , Erik Kline , Aram Galstyan

Tackling binary program analysis problems has traditionally implied manually defining rules and heuristics, a tedious and time consuming task for human analysts. In order to improve automation and scalability, we propose an alternative direction based on distributed representations of binary programs with applicability to a number of downstream tasks. We introduce Bin2vec, a new approach leveraging Graph Convolutional Networks (GCN) along with computational program graphs in order to learn a high dimensional representation of binary executable programs. We demonstrate the versatility of this approach by using our representations to solve two semantically different binary analysis tasks – functional algorithm classification and vulnerability discovery. We compare the proposed approach to our own strong baseline as well as published results, and demonstrate improvement over state-of-the-art methods for both tasks. We evaluated Bin2vec on 49191 binaries for the functional algorithm classification task, and on 30 different CWE-IDs including at least 100 CVE entries each for the vulnerability discovery task. We set a new state-of-the-art result by reducing the classification error by 40% compared to the source-code based inst2vec approach, while working on binary code. For almost every vulnerability class in our dataset, our prediction accuracy is over 80% (and over 90% in multiple classes).

中文翻译：

Bin2vec：学习用于安全任务的二进制可执行程序的表示

解决二进制程序分析问题传统上意味着手动定义规则和启发式方法，这对于人类分析师来说是一项繁琐且耗时的任务。为了提高自动化和可扩展性，我们提出了一个基于二进制程序分布式表示的替代方向，适用于许多下游任务。我们介绍了 Bin2vec，这是一种利用图卷积网络 (GCN) 和计算程序图的新方法，以学习二进制可执行程序的高维表示。我们通过使用我们的表示来解决两个语义不同的二进制分析任务——功能算法分类和漏洞发现，证明了这种方法的多功能性。我们将提议的方法与我们自己的强大基线以及已发布的结果进行比较，并展示对这两项任务的最先进方法的改进。我们在 49191 个二进制文件上评估了 Bin2vec，用于功能算法分类任务，并在 30 个不同的 CWE-ID 上评估了 Bin2vec，每个包括至少 100 个 CVE 条目用于漏洞发现任务。在处理二进制代码的同时，与基于源代码的 inst2vec 方法相比，我们通过将分类错误减少 40% 来设置新的最新结果。对于我们数据集中的几乎每个漏洞类别，我们的预测准确率都超过 80%（在多个类别中超过 90%）。在处理二进制代码的同时，与基于源代码的 inst2vec 方法相比，我们通过将分类错误减少 40% 来设置新的最新结果。对于我们数据集中的几乎每个漏洞类别，我们的预测准确率都超过 80%（在多个类别中超过 90%）。在处理二进制代码的同时，与基于源代码的 inst2vec 方法相比，我们通过将分类错误减少 40% 来设置新的最新结果。对于我们数据集中的几乎每个漏洞类别，我们的预测准确率都超过 80%（在多个类别中超过 90%）。

更新日期：2021-07-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文