Learning analysis strategies for octagon and context sensitivity from labeled data generated by static analyses,Formal Methods in System Design

当前位置： X-MOL 学术 › Form. Methods Syst. Des. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning analysis strategies for octagon and context sensitivity from labeled data generated by static analyses
Formal Methods in System Design ( IF 0.8 ) Pub Date : 2017-11-21 , DOI: 10.1007/s10703-017-0306-7
Kihong Heo , Hakjoo Oh , Hongseok Yang

We present a method for automatically learning an effective strategy for clustering variables for the Octagon analysis from a given codebase. This learned strategy works as a preprocessor of Octagon. Given a program to be analyzed, the strategy is first applied to the program and clusters variables in it. We then run a partial variant of the Octagon analysis that tracks relationships among variables within the same cluster, but not across different clusters. The notable aspect of our learning method is that although the method is based on supervised learning, it does not require manually-labeled data. The method does not ask human to indicate which pairs of program variables in the given codebase should be tracked. Instead it uses the impact pre-analysis for Octagon from our previous work and automatically labels variable pairs in the codebase as positive or negative. We implemented our method on top of a static buffer-overflow detector for C programs and tested it against open source benchmarks. Our experiments show that the partial Octagon analysis with the learned strategy scales up to 100KLOC and is 33$$\times $$× faster than the one with the impact pre-analysis (which itself is significantly faster than the original Octagon analysis), while increasing false alarms by only 2%. The general idea behind our methodis applicable to other types of static analyses as well. We demonstrate that our method is also effective to learn a strategy for context-sensitivity of interval analysis.

中文翻译：

从静态分析生成的标记数据中学习八边形和上下文敏感性的分析策略

我们提出了一种方法，用于从给定的代码库中自动学习用于八角形分析的聚类变量的有效策略。这个学到的策略可以作为 Octagon 的预处理器。给定一个要分析的程序，该策略首先应用于该程序并将其中的变量聚类。然后我们运行 Octagon 分析的部分变体，跟踪同一集群内变量之间的关系，但不跨不同集群。我们的学习方法的显着方面是，虽然该方法基于监督学习，但它不需要手动标记数据。该方法不要求人工指示应跟踪给定代码库中的哪些程序变量对。相反，它使用我们之前工作中对 Octagon 的影响预分析，并自动将代码库中的变量对标记为正面或负面。我们在 C 程序的静态缓冲区溢出检测器之上实现了我们的方法，并针对开源基准测试了它。我们的实验表明，使用学习策略的部分 Octagon 分析可扩展到 100KLOC，并且比使用影响预分析的分析快 33$$\times $$×（其本身比原始 Octagon 分析快得多），而误报率仅增加 2%。我们方法背后的一般思想也适用于其他类型的静态分析。我们证明了我们的方法对于学习区间分析的上下文敏感性策略也是有效的。我们在 C 程序的静态缓冲区溢出检测器之上实现了我们的方法，并针对开源基准测试了它。我们的实验表明，使用学习策略的部分 Octagon 分析可扩展到 100KLOC，并且比使用影响预分析的分析快 33$$\times $$×（其本身比原始 Octagon 分析快得多），而误报率仅增加 2%。我们方法背后的一般思想也适用于其他类型的静态分析。我们证明了我们的方法对于学习区间分析的上下文敏感性策略也是有效的。我们在 C 程序的静态缓冲区溢出检测器之上实现了我们的方法，并针对开源基准测试了它。我们的实验表明，使用学习策略的部分 Octagon 分析可扩展到 100KLOC，并且比使用影响预分析的分析快 33$$\times $$×（其本身比原始 Octagon 分析快得多），而误报率仅增加 2%。我们方法背后的一般思想也适用于其他类型的静态分析。我们证明了我们的方法对于学习区间分析的上下文敏感性策略也是有效的。我们的实验表明，使用学习策略的部分 Octagon 分析可扩展到 100KLOC，并且比使用影响预分析的分析快 33$$\times $$×（其本身比原始 Octagon 分析快得多），而误报率仅增加 2%。我们方法背后的一般思想也适用于其他类型的静态分析。我们证明了我们的方法对于学习区间分析的上下文敏感性策略也是有效的。我们的实验表明，使用学习策略的部分 Octagon 分析可扩展到 100KLOC，并且比使用影响预分析的分析快 33$$\times $$×（其本身比原始 Octagon 分析快得多），而误报率仅增加 2%。我们方法背后的一般思想也适用于其他类型的静态分析。我们证明了我们的方法对于学习区间分析的上下文敏感性策略也是有效的。

更新日期：2017-11-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>