A Machine-Learning Algorithm with Disjunctive Model for Data-Driven Program Analysis

Authors:
Minseok Jeon

Korea University, Republic of Korea

Korea University, Republic of Korea
View Profile

,
Sehun Jeong

Korea University, Republic of Korea

Korea University, Republic of Korea

0000-0003-4825-4870
View Profile

,
Sungdeok Cha

Korea University, Republic of Korea

Korea University, Republic of Korea
View Profile

,
Hakjoo Oh

Korea University, Republic of Korea

Korea University, Republic of Korea
View Profile

ACM Transactions on Programming Languages and Systems Volume 41 Issue 2Article No.: 13pp 1–41https://doi.org/10.1145/3293607

Published:19 June 2019Publication History

ACM Transactions on Programming Languages and Systems

Abstract

We present a new machine-learning algorithm with disjunctive model for data-driven program analysis. One major challenge in static program analysis is a substantial amount of manual effort required for tuning the analysis performance. Recently, data-driven program analysis has emerged to address this challenge by automatically adjusting the analysis based on data through a learning algorithm. Although this new approach has proven promising for various program analysis tasks, its effectiveness has been limited due to simple-minded learning models and algorithms that are unable to capture sophisticated, in particular disjunctive, program properties. To overcome this shortcoming, this article presents a new disjunctive model for data-driven program analysis as well as a learning algorithm to find the model parameters. Our model uses Boolean formulas over atomic features and therefore is able to express nonlinear combinations of program properties. A key technical challenge is to efficiently determine a set of good Boolean formulas, as brute-force search would simply be impractical. We present a stepwise and greedy algorithm that efficiently learns Boolean formulas. We show the effectiveness and generality of our algorithm with two static analyzers: context-sensitive points-to analysis for Java and flow-sensitive interval analysis for C. Experimental results show that our automated technique significantly improves the performance of the state-of-the-art techniques including ones hand-crafted by human experts.

References

Ole Agesen. 1994. Constraint-based Type Inference and Parametric Polymorphism. Springer Berlin, 78--100.Google Scholar
Tony Antoniadis, Konstantinos Triantafyllou, and Yannis Smaragdakis. 2017. Porting doop to Soufflé: A tale of inter-engine portability for datalog-based analyses. In Proceedings of the 6th ACM SIGPLAN International Workshop on State of the Art in Program Analysis (SOAP’17). ACM, 25--30.Google ScholarDigital Library
Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 21st ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications (OOPSLA’06). ACM, 169--190.Google ScholarDigital Library
Bruno Blanchet, Patrick Cousot, Radhia Cousot, Jérome Feret, Laurent Mauborgne, Antoine Miné, David Monniaux, and Xavier Rival. 2003. A static analyzer for large safety-critical software. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’03). ACM, 196--207.Google ScholarDigital Library
Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA’09). ACM, 243--262.Google ScholarDigital Library
Sooyoung Cha, Sehun Jeong, and Hakjoo Oh. 2016. Learning a Strategy for Choosing Widening Thresholds from a Large Codebase. Springer International Publishing, Cham, 25--41.Google Scholar
Kwonsoo Chae, Hakjoo Oh, Kihong Heo, and Hongseok Yang. 2017. Automatically generating features for learning program analysis heuristics. In Proceedings of the ACM Conference on Programming Languages (OOPSLA’17), Vol. 1.Google Scholar
Ramkrishna Chatterjee, Barbara G. Ryder, and William A. Landi. 1999. Relevant context inference. In Proceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’99). ACM, 133--146.Google Scholar
David Grove, Greg DeFouw, Jeffrey Dean, and Craig Chambers. 1997. Call graph construction in object-oriented languages. In Proceedings of the 12th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA’97). ACM, 108--124.Google ScholarDigital Library
Samuel Z. Guyer and Calvin Lin. 2003. Client-driven pointer analysis. In Proceedings of the 10th International Conference on Static Analysis (SAS’03). Springer-Verlag, Berlin, 214--236. Retrieved from: http://dl.acm.org/citation.cfm?id=1760267.1760284.Google ScholarDigital Library
Nevin Heintze and Olivier Tardieu. 2001. Demand-driven pointer analysis. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’01). ACM, 24--34.Google ScholarDigital Library
Kihong Heo, Hakjoo Oh, and Hongseok Yang. 2016. Learning a Variable-Clustering Strategy for Octagon from Labeled Data Generated by a Static Analysis. Springer Berlin, 237--256.Google Scholar
Kihong Heo, Hakjoo Oh, and Hongseok Yang. 2017a. Learning analysis strategies for octagon and context sensitivity from labeled data generated by static analyses. Form. Meth. Syst. Design 53, 2 (Oct. 2018), 189--220. Google ScholarDigital Library
Kihong Heo, Hakjoo Oh, and Kwangkeun Yi. 2017b. Machine-learning-guided selectively unsound static analysis. In Proceedings of the 39th International Conference on Software Engineering. ACM. Google ScholarDigital Library
Michael Hind. 2001. Pointer analysis: Haven’t we solved this problem yet? In Proceedings of the ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE’01). ACM, 54--61.Google ScholarDigital Library
Sehun Jeong, Minseok Jeon, Sungdeok Cha, and Hakjoo Oh. 2017. Data-driven context-sensitivity for points-to analysis. In Proceedings of the ACM Conference on Programming Languages(OOPSLA’17), Vol. 1.Google ScholarDigital Library
George Kastrinis and Yannis Smaragdakis. 2013c. Efficient and effective handling of exceptions in Java points-to analysis. In Proceedings of the 22nd International Conference on Compiler Construction (CC’13). Springer-Verlag, Berlin, 41--60.Google ScholarDigital Library
George Kastrinis and Yannis Smaragdakis. 2013a. Hybrid context-sensitivity for points-to analysis. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). ACM, 423--434. Google ScholarDigital Library
George Kastrinis and Yannis Smaragdakis. 2013b. Hybrid context-sensitivity for points-to analysis. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). ACM, 423--434. Google ScholarDigital Library
Woosuk Lee, Wonchan Lee, Dongok Kang, Kihong Heo, Hakjoo Oh, and Kwangkeun Yi. 2017. Sound non-statistical clustering of static analysis alarms. ACM Trans. Program. Lang. Syst. 39, 4, Article 16 (Aug. 2017). Google ScholarDigital Library
Ondřej Lhoták and Laurie Hendren. 2006. Context-sensitive points-to analysis: Is it worth it? In Proceedings of the 15th International Conference on Compiler Construction (CC’06). Springer-Verlag, Berlin, 47--64. Google ScholarDigital Library
Ondřej Lhoták and Laurie Hendren. 2008. Evaluating the benefits of context-sensitive points-to analysis using a BDD-based implementation. ACM Trans. Softw. Eng. Methodol. 18, 1, Article 3 (Oct. 2008). Google ScholarDigital Library
Huisong Li, Francois Berenger, Bor-Yuh Evan Chang, and Xavier Rival. 2017. Semantic-directed clumping of disjunctive abstract states. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL’17). ACM, 32--45.Google ScholarDigital Library
Donglin Liang and Mary Jean Harrold. 1999. Efficient points-to analysis for whole-program analysis. In Proceedings of the 7th European Software Engineering Conference Held Jointly with the 7th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE-7). Springer-Verlag, London, UK, 199--215. Retrieved from: http://dl.acm.org/citation.cfm?id=318773.318943. Google ScholarDigital Library
Donglin Liang, Maikel Pennings, and Mary Jean Harrold. 2005. Evaluating the impact of context-sensitivity on Andersen’s algorithm for Java programs. In Proceedings of the 6th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE’05). ACM, 6--12.Google ScholarDigital Library
Percy Liang, Omer Tripp, and Mayur Naik. 2011. Learning minimal abstractions. In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’11). ACM, 31--42. Google ScholarDigital Library
Ana Milanova, Atanas Rountev, and Barbara G. Ryder. 2005. Parameterized object sensitivity for points-to analysis for Java. ACM Trans. Softw. Eng. Methodol. 14, 1 (Jan. 2005), 1--41. Google ScholarDigital Library
Antoine Miné. 2006. The octagon abstract domain. Higher Order Symbol. Comput. 19, 1 (Mar. 2006), 31--100. Google ScholarDigital Library
Hakjoo Oh. 2009. Large spurious cycle in global static analyses and its algorithmic mitigation. In Proceedings of the 7th Asian Symposium on Programming Languages and Systems (APLAS’09). 14--29.Google ScholarDigital Library
Hakjoo Oh, Kihong Heo, Wonchan Lee, Woosuk Lee, Daejun Park, Jeehoon Kang, and Kwangkeun Yi. 2014a. Global sparse analysis framework. ACM Trans. Program. Lang. Syst. 36, 3, Article 8 (Sep. 2014).Google ScholarDigital Library
Hakjoo Oh, Kihong Heo, Wonchan Lee, Woosuk Lee, and Kwangkeun Yi. 2012. Design and implementation of sparse global analyses for C-like languages. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12). ACM, 229--238.Google ScholarDigital Library
Hakjoo Oh, Wonchan Lee, Kihong Heo, Hongseok Yang, and Kwangkeun Yi. 2014b. Selective context-sensitivity guided by impact pre-analysis. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’14). ACM, 475--484.Google ScholarDigital Library
Hakjoo Oh, Hongseok Yang, and Kwangkeun Yi. 2015. Learning a strategy for adapting a program analysis via Bayesian optimisation. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’15). ACM, 572--588.Google ScholarDigital Library
Xavier Rival and Laurent Mauborgne. 2007. The trace partitioning abstract domain. ACM Trans. Program. Lang. Syst. 29, 5, Article 26 (Aug. 2007). Google ScholarDigital Library
Erik Ruf. 1995. Context-insensitive alias analysis reconsidered. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’95). ACM, 13--22. Google ScholarDigital Library
Erik Ruf. 2000. Effective synchronization removal for Java. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’00). ACM, 208--218.Google ScholarDigital Library
Bernhard Scholz, Herbert Jordan, Pavle Subotić, and Till Westmann. 2016. On fast large-scale program analysis in datalog. In Proceedings of the 25th International Conference on Compiler Construction (CC’16). ACM, 196--206.Google ScholarDigital Library
Yannis Smaragdakis and George Balatsouras. 2015. Pointer analysis. Found. Trends Program. Lang. 2, 1 (Apr. 2015), 1--69. Google ScholarDigital Library
Yannis Smaragdakis, Martin Bravenboer, and Ondrej Lhoták. 2011. Pick your contexts well: Understanding object-sensitivity. In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’11). ACM, 17--30.Google ScholarDigital Library
Yannis Smaragdakis, George Kastrinis, and George Balatsouras. 2014. Introspective analysis: Context-sensitivity, across the board. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’14). ACM, 485--495.Google ScholarDigital Library
Manu Sridharan and Rastislav Bodík. 2006. Refinement-based context-sensitive points-to analysis for Java. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’06). ACM, 387--400.Google ScholarDigital Library
Manu Sridharan, Denis Gopan, Lexin Shan, and Rastislav Bodík. 2005. Demand-driven points-to analysis for Java. In Proceedings of the 20th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA’05). ACM, 59--76. Google ScholarDigital Library
Tian Tan, Yue Li, and Jingling Xue. 2016. Making k-object-sensitive pointer analysis more precise with still k-limiting. In Proceedings of the 23rd International Symposium on Static Analysis (SAS’16). 489--510.Google ScholarCross Ref
Tian Tan, Yue Li, and Jingling Xue. 2017. Efficient and precise points-to analysis: Modeling the heap by merging equivalent automata. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’17). ACM, 278--291.Google ScholarDigital Library
Omer Tripp, Marco Pistoia, Stephen J. Fink, Manu Sridharan, and Omri Weisman. 2009. TAJ: Effective taint analysis of web applications. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’09). ACM, 87--97.Google ScholarDigital Library
Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 1999. Soot—A Java bytecode optimization framework. In Proceedings of the Conference of the Centre for Advanced Studies on Collaborative Research (CASCON’99). IBM Press, 13. Retrieved from: http://dl.acm.org/citation.cfm?id=781995.782008. Google ScholarDigital Library
Shiyi Wei and Barbara G. Ryder. 2015. Adaptive context-sensitive analysis for JavaScript. In Proceedings of the 29th European Conference on Object-Oriented Programming (ECOOP’15) (Leibniz International Proceedings in Informatics (LIPIcs)), John Tang Boyland (Ed.), Vol. 37. Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 712--734.Google Scholar
Robert P. Wilson and Monica S. Lam. 1995. Efficient context-sensitive pointer analysis for C programs. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’95). ACM, 1--12. Google ScholarDigital Library
Xin Zhang, Ravi Mangal, Radu Grigore, Mayur Naik, and Hongseok Yang. 2014. On abstraction refinement for program analyses in datalog. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’14). ACM, 239--248.Google ScholarDigital Library

Index Terms

A Machine-Learning Algorithm with Disjunctive Model for Data-Driven Program Analysis
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
2. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Program analysis

Recommendations

Data-driven context-sensitivity for points-to analysis

We present a new data-driven approach to achieve highly cost-effective context-sensitive points-to analysis for Java. While context-sensitivity has greater impact on the analysis precision and performance than any other precision-improving techniques, ...
Read More
Precise and scalable points-to analysis via data-driven context tunneling

We present context tunneling, a new approach for making k-limited context-sensitive points-to analysis precise and scalable. As context-sensitivity holds the key to the development of precise and scalable points-to analysis, a variety of techniques for ...
Read More
Learning analysis strategies for octagon and context sensitivity from labeled data generated by static analyses

We present a method for automatically learning an effective strategy for clustering variables for the Octagon analysis from a given codebase. This learned strategy works as a preprocessor of Octagon. Given a program to be analyzed, the strategy is first ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Programming Languages and Systems Volume 41, Issue 2
Special Issue on ESOP 2017 and Regular Papers
June 2019
305 pages
ISSN:0164-0925
EISSN:1558-4593
DOI:10.1145/3320016
Editor:
Andrew Myers
Cornell University, USA
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 June 2019
- Accepted: 1 November 2018
- Revised: 1 October 2018
- Received: 1 December 2017
Published in toplas Volume 41, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Data-driven program analysis
context-sensitivity
flow-sensitivity
static analysis
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 601
  Total Downloads
- Downloads (Last 12 months)197
- Downloads (Last 6 weeks)27
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Machine-Learning Algorithm with Disjunctive Model for Data-Driven Program Analysis

ACM Transactions on Programming Languages and Systems

Abstract

References

Cited By

Index Terms

Recommendations

Data-driven context-sensitivity for points-to analysis

Precise and scalable points-to analysis via data-driven context tunneling

Learning analysis strategies for octagon and context sensitivity from labeled data generated by static analyses