A Machine-Learning Algorithm with Disjunctive Model for Data-Driven Program Analysis,ACM Transactions on Programming Languages and Systems

当前位置： X-MOL 学术 › ACM Trans. Program. Lang. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Machine-Learning Algorithm with Disjunctive Model for Data-Driven Program Analysis
ACM Transactions on Programming Languages and Systems ( IF 1.3 ) Pub Date : 2019-06-19 , DOI: 10.1145/3293607
Minseok Jeon ₁ , Sehun Jeong ₁ , Sungdeok Cha ₁ , Hakjoo Oh ₁

Affiliation

We present a new machine-learning algorithm with disjunctive model for data-driven program analysis. One major challenge in static program analysis is a substantial amount of manual effort required for tuning the analysis performance. Recently, data-driven program analysis has emerged to address this challenge by automatically adjusting the analysis based on data through a learning algorithm. Although this new approach has proven promising for various program analysis tasks, its effectiveness has been limited due to simple-minded learning models and algorithms that are unable to capture sophisticated, in particular disjunctive, program properties. To overcome this shortcoming, this article presents a new disjunctive model for data-driven program analysis as well as a learning algorithm to find the model parameters. Our model uses Boolean formulas over atomic features and therefore is able to express nonlinear combinations of program properties. A key technical challenge is to efficiently determine a set of good Boolean formulas, as brute-force search would simply be impractical. We present a stepwise and greedy algorithm that efficiently learns Boolean formulas. We show the effectiveness and generality of our algorithm with two static analyzers: context-sensitive points-to analysis for Java and flow-sensitive interval analysis for C. Experimental results show that our automated technique significantly improves the performance of the state-of-the-art techniques including ones hand-crafted by human experts.

中文翻译：

一种用于数据驱动程序分析的具有析取模型的机器学习算法

我们提出了一种新的机器学习算法，该算法具有用于数据驱动程序分析的析取模型。静态程序分析的一个主要挑战是调整分析性能所需的大量手动工作。最近，出现了数据驱动的程序分析，通过学习算法自动调整基于数据的分析来应对这一挑战。尽管这种新方法已被证明对各种程序分析任务很有前景，但由于简单的学习模型和算法无法捕捉复杂的，特别是分离的程序属性，其有效性受到限制。为了克服这个缺点，本文提出了一种新的数据驱动程序分析的析取模型以及一种寻找模型参数的学习算法。我们的模型在原子特征上使用布尔公式，因此能够表达程序属性的非线性组合。一个关键的技术挑战是有效地确定一组好的布尔公式，因为蛮力搜索根本不切实际。我们提出了一种逐步的贪心算法，可以有效地学习布尔公式。我们用两个静态分析器展示了我们算法的有效性和通用性：Java 的上下文敏感点分析和 C 的流敏感区间分析。实验结果表明，我们的自动化技术显着提高了状态的性能-艺术技术，包括由人类专家手工制作的技术。一个关键的技术挑战是有效地确定一组好的布尔公式，因为蛮力搜索根本不切实际。我们提出了一种逐步的贪心算法，可以有效地学习布尔公式。我们用两个静态分析器展示了我们算法的有效性和通用性：Java 的上下文敏感点分析和 C 的流敏感区间分析。实验结果表明，我们的自动化技术显着提高了状态的性能-艺术技术，包括由人类专家手工制作的技术。一个关键的技术挑战是有效地确定一组好的布尔公式，因为蛮力搜索根本不切实际。我们提出了一种逐步的贪心算法，可以有效地学习布尔公式。我们用两个静态分析器展示了我们算法的有效性和通用性：Java 的上下文敏感点分析和 C 的流敏感区间分析。实验结果表明，我们的自动化技术显着提高了状态的性能-艺术技术，包括由人类专家手工制作的技术。

更新日期：2019-06-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>