Abstract
Dynamic languages, such as JavaScript, employ string-to-code primitives to turn dynamically generated text into executable code at run-time. These features make standard static analysis extremely hard if not impossible, because its essential data structures, i.e., the control-flow graph and the system of recursive equations associated with the program to analyze, are themselves dynamically mutating objects. Nevertheless, assembling code at run-time by manipulating strings, such as by eval in JavaScript, has been always strongly discouraged, since it is often recognized that “eval is evil,” leading static analyzers to not consider such statements or ignoring their effects. Unfortunately, the lack of formal approaches to analyze string-to-code statements pose a perfect habitat for malicious code, that is surely evil and do not respect good practice rules, allowing them to hide malicious intents as strings to be converted to code and making static analyses blind to the real malicious aim of the code. Hence, the need to handle string-to-code statements approximating what they can execute, and therefore allowing the analysis to continue (even in the presence of dynamically generated program statements) with an acceptable degree of precision, should be clear. To reach this goal, we propose a static analysis allowing us to collect string values and to soundly over-approximate and analyze the code potentially executed by a string-to-code statement.
- Hynek Petrak [n.d.]. Hynek Petrak JS Malware collection. Retrieved from https://github.com/HynekPetrak/javascript-malware-collection.Google Scholar
- J. (D.) An, A. Chaudhuri, J. S. Foster, and M. Hicks. 2011. Dynamic inference of static types for Ruby. In Proceedings of the ACM SIGPLAN Symposium on Principles of Programming Languages (POPL’11), T. Ball and M. Sagiv (Eds.). ACM, 459--472.Google Scholar
- B. Anckaert, M. Madou, and K. De Bosschere. 2006. A model for self-modifying code. In Proceedings of the International Workshop on Information Hiding (LNCS), J. Camenisch, C. S. Collberg, N. F. Johnson, and P. Sallee (Eds.), Vol. 4437. Springer, 232--248.Google Scholar
- Vincenzo Arceri and Sergio Maffeis. 2017. Abstract domains for type juggling. Electr. Notes Theor. Comput. Sci. 331 (2017), 41--55. DOI:https://doi.org/10.1016/j.entcs.2017.02.003Google ScholarCross Ref
- Vincenzo Arceri and Isabella Mastroeni. 2019. An automata-based abstract semantics for string manipulation languages. In Proceedings of the 7th International Workshop on Verification and Program Transformation, (VPT@Programming’19). 19--33. DOI:https://doi.org/10.4204/EPTCS.299.5Google Scholar
- Vincenzo Arceri and Isabella Mastroeni. 2020. A sound abstract interpreter for dynamic code. In Proceedings of the 35th ACM/SIGAPP Symposium on Applied Computing (SAC’20), Chih-Cheng Hung, Tomás Cerný, Dongwan Shin, and Alessio Bechini (Eds.). ACM, 1979--1988. DOI:https://doi.org/10.1145/3341105.3373964Google ScholarDigital Library
- Vincenzo Arceri, Isabella Mastroeni, and Sunyi Xu. 2020. Static analysis for ECMAScript string manipulation programs. Appl. Sci. 10 (2020), 3525. DOI:https://doi.org/10.3390/app10103525Google ScholarCross Ref
- Vincenzo Arceri, Martina Olliaro, Agostino Cortesi, and Isabella Mastroeni. 2019. Completeness of abstract domains for string analysis of JavaScript programs. In Proceedings of the 16th International Colloquium on Theoretical Aspects of Computing (ICTAC’19) (Lecture Notes in Computer Science), Robert M. Hierons and Mohamed Mosbah (Eds.), Vol. 11884. Springer, 255--272. DOI:https://doi.org/10.1007/978-3-030-32505-3_15Google ScholarDigital Library
- M. Balliu and I. Mastroeni. 2010. A weakest precondition approach to robustness. Trans. Comput. Sci. 10 (2010), 261--297.Google ScholarCross Ref
- Al Bessey, Ken Block, Benjamin Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles-Henri Gros, Asya Kamsky, Scott McPeak, and Dawson R. Engler. 2010. A few billion lines of code later: Using static analysis to find bugs in the real world. Commun. ACM 53, 2 (2010), 66--75. DOI:https://doi.org/10.1145/1646353.1646374Google ScholarDigital Library
- P. Biggar and D. Gregg. 2009. Static Analysis of Dynamic Scripting Languages. Technical Report. Department of Computer Science, Trinity College Dublin.Google Scholar
- Eric Bodden, Andreas Sewe, Jan Sinschek, Hela Oueslati, and Mira Mezini. 2011. Taming reflection: Aiding static analysis in the presence of reflection and custom class loaders. In Proceedings of the 33rd International Conference on Software Engineering (ICSE’11). 241--250.Google ScholarDigital Library
- Janusz A. Brzozowski. 1964. Derivatives of regular expressions. J. ACM 11, 4 (1964), 481--494.Google ScholarDigital Library
- Samuele Buro and Isabella Mastroeni. 2018. Abstract code injection—A semantic approach based on abstract non-interference. In Proceedings of the 19th International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI’18) (Lecture Notes in Computer Science), Isil Dillig and Jens Palsberg (Eds.), Vol. 10747. Springer, 116--137.Google ScholarCross Ref
- H. Cai, Z. Shao, and A. Vaynberg. 2007. Certified self-modifying code. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’07), J. Ferrante and K. S. McKinley (Eds.). ACM, 66--77.Google Scholar
- Aske Simon Christensen, Anders Møller, and Michael I. Schwartzbach. 2003. Precise analysis of string expressions. In Proceedings of the 10th International Symposium on Static Analysis (SAS’03) (Lecture Notes in Computer Science), Radhia Cousot (Ed.), Vol. 2694. Springer, 1--18. DOI:https://doi.org/10.1007/3-540-44898-5_1Google Scholar
- R. Chugh, J. A. Meister, R. Jhala, and S. Lerner. 2009. Staged information flow for JavaScript. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’09), M. Hind and A. Diwan (Eds.). ACM, 50--62.Google Scholar
- P. Cousot. 1997. Types as abstract interpretations (invited paper). In Proceedings of the 24th ACM Symposium on Principles of Programming Languages (POPL’97). ACM Press, 316--331.Google Scholar
- P. Cousot and R. Cousot. 1977. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proceedings of the 4th ACM Symposium on Principles of Programming Languages (POPL’77). ACM Press, 238--252.Google Scholar
- P. Cousot and R. Cousot. 1992. Abstract interpretation frameworks. J. Logic Comput. 2, 4 (1992), 511--547.Google ScholarCross Ref
- P. Cousot and R. Cousot. 1995. Formal language, grammar and set-constraint-based program analysis by abstract interpretation. In Proceedings of the 7th ACM Conference on Functional Programming Languages and Computer Architecture. ACM Press, New York, NY, 170--181.Google Scholar
- P. Cousot and N. Halbwachs. 1978. Automatic discovery of linear restraints among variables of a program. In Proceedings of the 5th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL’78). ACM Press, 84--96. DOI:https://doi.org/10.1145/512760.512770Google Scholar
- Charlie Curtsinger, Benjamin Livshits, Benjamin G. Zorn, and Christian Seifert. 2011. ZOZZLE: Fast and precise in-browser javascript malware detection. In Proceedings of the 20th USENIX Security Symposium. USENIX Association. http://static.usenix.org/events/sec11/tech/full_papers/Curtsinger.pdfGoogle Scholar
- Mila Dalla Preda, Roberto Giacobazzi, Arun Lakhotia, and Isabella Mastroeni. 2015. Abstract symbolic automata: Mixed syntactic/semantic similarity analysis of executables. ACM SIGPLAN Notices 50, 1 (2015), 329--341.Google ScholarDigital Library
- M. Davis, R. Sigal, and E. J. Weyuker. 1994. Computability, Complexity, and Languages: Fundamentals of Theoretical Computer Science (Computer Science and Scientific Computing), 2nd ed. Elsevier.Google Scholar
- Kyung-Goo Doh, Hyunha Kim, and David A. Schmidt. 2009. Abstract parsing: Static analysis of dynamically generated string output using LR-parsing technology. In Proceedings of the 16th International Symposium on Static Analysis (SAS’09) (Lecture Notes in Computer Science), Jens Palsberg and Zhendong Su (Eds.), Vol. 5673. Springer, 256--272. DOI:https://doi.org/10.1007/978-3-642-03237-0_18Google Scholar
- S. Drape, C. Thomborson, and A. Majumdar. 2007. Specifying imperative data obfuscations. In Proceedings of the Conference on Information Security (IS’07) (Lecture Notes in Computer Science), J. A. Garay, A. K. Lenstra, M. Mambo, and R. Peralta (Eds.), Vol. 4779. Springer Verlag, 299--314.Google Scholar
- V. D’Silva. 2006. Widening for Automata. Diploma Thesis, Institut Fur Informatick, Universitat Zurich.Google Scholar
- François Gauthier, Behnaz Hassanshahi, and Alexander Jordan. 2018. AFFOGATO: Runtime detection of injection attacks for Node.js. In Proceedings of the ISSTA/ECOOP Workshops (ISSTA’18), Julian Dolby, William G. J. Halfond, and Ashish Mishra (Eds.). ACM, 94--99. DOI:https://doi.org/10.1145/3236454.3236502Google Scholar
- R. Giacobazzi. 1998. Abductive analysis of modular logic programs. J. Logic Comput. 8, 4 (1998), 457--484.Google ScholarCross Ref
- R. Giacobazzi, N. D. Jones, and I. Mastroeni. 2012. Obfuscation by partial evaluation of distorted interpreters. In Proceedings of the ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation (PEPM’12), O. Kiselyov and S. Thompson (Eds.). ACM Press, 63--72.Google Scholar
- Roberto Giacobazzi and Isabella Mastroeni. 2010. A proof system for abstract non-interference. J. Log. Comput. 20, 2 (2010), 449--479.Google ScholarDigital Library
- Roberto Giacobazzi and Isabella Mastroeni. 2012. Making abstract interpretation incomplete: Modeling the potency of obfuscation. In Proceedings of the 19th International Symposium on Static Analysis (SAS’12) (Lecture Notes in Computer Science), Antoine Miné and David Schmidt (Eds.), Vol. 7460. Springer, 129--145.Google ScholarDigital Library
- Roberto Giacobazzi and Isabella Mastroeni. 2018. Abstract non-interference: A unifying framework for weakening information-flow. ACM Trans. Priv. Secur. 21, 2 (2018), 9:1--9:31.Google ScholarDigital Library
- Nevin Heintze and Joxan Jaffar. 1994. Set constraints and set-based analysis. In Proceedings of the 2nd International Workshop on Principles and Practice of Constraint Programming (PPCP’94) (Lecture Notes in Computer Science), Alan Borning (Ed.), Vol. 874. Springer, 281--298. DOI:https://doi.org/10.1007/3-540-58601-6_107Google ScholarCross Ref
- Pieter Hooimeijer, Benjamin Livshits, David Molnar, Prateek Saxena, and Margus Veanes. 2011. Fast and precise sanitizer analysis with BEK. In Proceedings of the 20th USENIX Security Symposium. USENIX Association. Retrieved from http://static.usenix.org/events/sec11/tech/full_papers/Hooimeijer.pdf.Google Scholar
- Simon Holm Jensen, Peter A. Jonsson, and Anders Møller. 2012. Remedying the eval that men do. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA’12), Mats Per Erik Heimdahl and Zhendong Su (Eds.). ACM, 34--44. DOI:https://doi.org/10.1145/2338965.2336758Google ScholarDigital Library
- Simon Holm Jensen, Anders Møller, and Peter Thiemann. 2009. Type analysis for JavaScript. In Proceedings of the 16th International Symposium on Static Analysis (SAS’09). 238--255.Google ScholarDigital Library
- R. Karim, F. Tip, A. Sochurkova, and K. Sen. 2018. Platform-independent dynamic taint analysis for JavaScript. IEEE Trans. Softw. Eng. 46, 12 (2020), 1364--1379.Google ScholarCross Ref
- Vineeth Kashyap, Kyle Dewey, Ethan A. Kuefner, John Wagner, Kevin Gibbons, John Sarracino, Ben Wiedermann, and Ben Hardekopf. 2014. JSAI: A static analysis platform for JavaScript. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’14). 121--132.Google ScholarDigital Library
- Hyunha Kim, Kyung-Goo Doh, and David A. Schmidt. 2013. Static validation of dynamically generated HTML documents based on abstract parsing and semantic processing. In Proceedings of the 20th International Symposium on Static Analysis (SAS’13) (Lecture Notes in Computer Science), Francesco Logozzo and Manuel Fähndrich (Eds.), Vol. 7935. Springer, 194--214. DOI:https://doi.org/10.1007/978-3-642-38856-9_12Google Scholar
- Hongki Lee, Sooncheol Won, Joonho Jin, Junhee Cho, and Sukyoung Ryu. 2012. SAFE: Formal specification and implementation of a scalable analysis framework for ECMAScript. In Proceedings of the International Workshop on Foundations of Object-Oriented Languages. ACM.Google Scholar
- Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondrej Lhoták, José Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z. Guyer, Uday P. Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In defense of soundiness: A manifesto. Commun. ACM 58, 2 (2015), 44--46. DOI:https://doi.org/10.1145/2644805Google ScholarDigital Library
- Isabella Mastroeni and Durica Nikolic. 2010. Abstract program slicing: From theory towards an implementation. In Proceedings of the 12th International Conference on Formal Engineering Methods (ICFEM’10) (Lecture Notes in Computer Science), Jin Song Dong and Huibiao Zhu (Eds.), Vol. 6447. Springer, 452--467.Google ScholarCross Ref
- Isabella Mastroeni and Damiano Zanardini. 2017. Abstract program slicing: An abstract interpretation-based approach to program slicing. ACM Trans. Comput. Log. 18, 1 (2017), 7:1--7:58.Google ScholarDigital Library
- N. Mavrogiannopoulos, N. Kisserli, and B. Preneel. 2011. A taxonomy of self-modifying code for obfuscation. Comput. Secur. 30, 8 (2011), 679--691.Google ScholarDigital Library
- Fadi Meawad, Gregor Richards, Floréal Morandat, and Jan Vitek. 2012. Eval begone!: Semi-automated removal of eval from javascript programs. In Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’12), Gary T. Leavens and Matthew B. Dwyer (Eds.). ACM, 607--620. DOI:https://doi.org/10.1145/2384616.2384660Google ScholarDigital Library
- Yasuhiko Minamide. 2005. Static approximation of dynamically generated Web pages. In Proceedings of the 14th International Conference on World Wide Web (WWW’05), Allan Ellis and Tatsuya Hagino (Eds.). ACM, 432--441. DOI:https://doi.org/10.1145/1060745.1060809Google ScholarDigital Library
- Anders Møller. 2015. Static analysis of JavaScript. In Proceedings of the 22nd International Symposium on Static Analysis (SAS’15).Google Scholar
- Luca Negrini, Vincenzo Arceri, Pietro Ferrara, and Agostino Cortesi. 2020. Twinning automata and regular expressions for string static analysis. Retrieved from https://arxiv:cs.SE/2006.02715.Google Scholar
- Flemming Nielson, Hanne Riis Nielson, and Chris Hankin. 1999. Principles of Program Analysis. Springer. DOI:https://doi.org/10.1007/978-3-662-03811-6Google ScholarDigital Library
- Changhee Park and Sukyoung Ryu. 2015. Scalable and precise static analysis of JavaScript applications via loop-sensitivity. In Proceedings of the 29th European Conference on Object-Oriented Programming (ECOOP’15). 735--756.Google Scholar
- Gregor Richards, Christian Hammer, Brian Burg, and Jan Vitek. 2011. The eval that men do—A large-scale study of the use of eval in JavaScript applications. In Proceedings of the 25th European Conference on Object-Oriented Programming (ECOOP’11) (Lecture Notes in Computer Science), Mira Mezini (Ed.), Vol. 6813. Springer, 52--78. DOI:https://doi.org/10.1007/978-3-642-22655-7_4Google Scholar
- Helmut Seidl, Reinhard Wilhelm, and Sebastian Hack. 2012. Compiler Design—Analysis and Transformation. Springer.Google Scholar
- Cristian-Alexandru Staicu, Michael Pradel, and Benjamin Livshits. 2018. SYNODE: Understanding and automatically preventing injection attacks on NODE.JS. In Proceedings of the 25th Annual Network and Distributed System Security Symposium (NDSS’18). The Internet Society. Retrieved from http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2018/02/ndss2018_07A-2_Staicu_paper.pdf.Google ScholarCross Ref
- Peter Thiemann. 2005. Grammar-based analysis of string expressions. In Proceedings of the ACM SIGPLAN International Workshop on Types in Languages Design and Implementation (TLDI’05), J. Gregory Morrisett and Manuel Fähndrich (Eds.). ACM, 59--70. DOI:https://doi.org/10.1145/1040294.1040300Google ScholarDigital Library
- Arnaud Venet. 1999. Automatic analysis of pointer aliasing for untyped programs. Sci. Comput. Program. 35, 2 (1999), 223--248.Google ScholarDigital Library
- Junjie Wang, Yinxing Xue, Yang Liu, and Tian Huat Tan. 2015. JSDC: A hybrid approach for JavaScript malware detection and classification. In Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security (ASIA CCS’15), Feng Bao, Steven Miller, Jianying Zhou, and Gail-Joon Ahn (Eds.). ACM, 109--120. DOI:https://doi.org/10.1145/2714576.2714620Google ScholarDigital Library
- X. Wang, Y. Jhi, S. Zhu, and P. Liu. 2008. STILL: Exploit code detection via static taint and initialization analyses. In Proceedings of the Annual Computer Security Applications Conference (ACSAC’08). IEEE Computer Society, 289--298.Google Scholar
- Yichen Xie and Alex Aiken. 2006. Static detection of security vulnerabilities in scripting languages. In Proceedings of the 15th USENIX Security Symposium, Angelos D. Keromytis (Ed.). USENIX Association. Retrieved from https://www.usenix.org/conference/15th-usenix-security-symposium/static-detection-security-vulnerabilities-scripting.Google Scholar
- Yinxing Xue, Junjie Wang, Yang Liu, Hao Xiao, Jun Sun, and Mahinthan Chandramohan. 2015. Detection and classification of malicious JavaScript via attack behavior modelling. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA’15), Michal Young and Tao Xie (Eds.). ACM, 48--59. DOI:https://doi.org/10.1145/2771783.2771814Google ScholarDigital Library
- Fang Yu, Muath Alkhalaf, and Tevfik Bultan. 2011. Patching vulnerabilities with sanitization synthesis. In Proceedings of the 33rd International Conference on Software Engineering (ICSE’11), Richard N. Taylor, Harald C. Gall, and Nenad Medvidovic (Eds.). ACM, 251--260. DOI:https://doi.org/10.1145/1985793.1985828Google ScholarDigital Library
Index Terms
- Analyzing Dynamic Code: A Sound Abstract Interpreter for Evil Eval
Recommendations
A sound abstract interpreter for dynamic code
SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied ComputingDynamic languages, such as JavaScript, employ string-to-code primitives to turn dynamically generated text into executable code at run-time. These features make standard static analysis extremely hard if not impossible because its essential data ...
Pushdown control-flow analysis for free
POPL '16Traditional control-flow analysis (CFA) for higher-order languages introduces spurious connections between callers and callees, and different invocations of a function may pollute each other's return flows. Recently, three distinct approaches have been ...
Static program analysis of embedded executable assembly code
CASES '04: Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systemsWe consider the problem of automatically checking if coding standards have been followed in the development of embedded applications. The problem arises from practical considerations because DSP chip manufacturers (in our case Texas Instruments) want ...
Comments