Abstract
Mutation-based fuzzing is a simple yet effective technique to discover bugs and security vulnerabilities in software. Given a set of well-formed initial seeds, mutation-based fuzzers continually generate interesting seeds by applying specific mutation strategy in order to maximize code coverage or the number of unique bugs explored at any point-in-time. However, existing fuzzers remain limited in the paths it could cover since it simply follows a uniform distribution to choose mutation operators. In this paper, we proposed a novel context-aware adaptive mutation scheme, namely CMFuzz, which utilizes a contextual bandit algorithm LinUCB to effectively choose optimal mutation operators for various seed files. To this end, CMFuzz dynamically extracts and encodes file characteristics, which allows mutation-based fuzzers to perform context-aware mutation. We apply this scheme on top of several state-of-the-art fuzzers, i.e., PTfuzz, AFL, and AFLFast, and implement CMFuzz-PT, CMFuzz-AFL, and CMFuzz-AFLFast, respectively. We conduct evaluation on 12 real-world open source applications and LAVA-M dataset against their counterparts. Extensive evaluations demonstrate that CMFuzz-based fuzzers achieve higher code coverage and find more crashes at a faster rate than their counterparts on most cases. Furthermore, we also utilize other mainstream bandit algorithms, e.g., Thompson Sample and epsilon-greedy, and implement Thompson-PT and Greedy-PT based on PTfuzz to examine the performance of proposed model. CMFuzz-PT significantly outperforms Thompson-PT especially in terms of unique crashes and paths, i.e., found 1.79× unique crashes and 1.29× unique paths on average. Compared to Greedy-PT, our approach still increases the amount of unique crashes and paths by 1.11× and 1.05×, respectively.
Similar content being viewed by others
References
Adobe (2019) A Basic Distributed Fuzzing Framework for FOE.https://blogs.adobe.com/security/2012/05/a-basic-distributed-fuzzing-framework-for-foe. html. Accessed 5 Dec 2019
Agrawal S, Goyal N (2011) Analysis of Thompson sampling for the multi-armed bandit problem. J Mach Learn Res 23:1–26
Aschermann C, Schumilo S, Blazytko T et al (2019) REDQUEEN: fuzzing with input-to-state correspondence. In: Proceedings of the 2019 network and distributed system security symposium. The Internet Society, San Diego, pp 1–15
Böhme M, Pham V-T, Roychoudhury A (2016) Coverage-based Greybox Fuzzing as Markov Chain. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security - CCS’16. ACM Press, New York, pp. 1032–1043
Böhme M, Pham V-T, Nguyen M-D, Roychoudhury A (2017) Directed Greybox Fuzzing. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security - CCS ‘17. ACM Press, New York, pp. 2329–2344
Böttinger K, Godefroid P, Singh R (2018) Deep Reinforcement Fuzzing. In: Proceedings of the 2018 IEEE Security and Privacy Workshops (SPW). IEEE, San Francisco, pp. 116–122
Cha SK, Woo M, Brumley D (2015) Program-adaptive mutational fuzzing. In: Proceedings of the 2015 IEEE Symposium on Security and Privacy. IEEE, San Jose, pp. 725–741
Chen P, Chen H (2018) Angora: efficient fuzzing by principled search. In: Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP). IEEE, San Francisco, pp 711–725
Chen H, Xue Y, Li Y, et al (2018) Hawkeye: Towards a Desired Directed Grey-box Fuzzer. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. ACM, New York, pp. 2095–2108
Dhillon IS, Sra S (2005) Generalized Nonnegative Matrix Approximations with Bregman Divergences. Advances in Neural Information Processing Systems 283–290
Dolan-Gavitt B, Hulin P, Kirda E, et al (2016) LAVA: large-scale automated vulnerability addition. In: Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP). IEEE, San Jose, pp. 110–121
Drozd W, Wagner MD (2018) FuzzerGym: a competitive framework for fuzzing and learning. arXiv e-prints
Gan S, Zhang C, Qin X, et al (2018) CollAFL: path sensitive fuzzing. In: Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP). IEEE, San Francisco, pp. 679–696
Gan S, Zhang C, Chen P, et al (2020) GREYONE: Data Flow Sensitive Fuzzing. Proceedings of the 2020 USENIX Security Symposium. USENIX, Boston, pp. 1–18
Godefroid P, Levin MY, Molnar D (2012) SAGE: Whitebox fuzzing for security testing. Commun ACM 55:40–44. https://doi.org/10.1145/2093548.2093564
Godefroid P, Peleg H, Singh R (2017) Learn&Fuzz: machine learning for input fuzzing. In: Proceedings of the 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, Urbana-Champaign, pp. 50–59
Google (2019a) ClusterFuzz. https://google.github.io/clusterfuzz/.
Google (2019b) OSS-Fuzz. https://google.github.io/oss-fuzz/.
Han H, Cha SK (2017) IMF: Inferred Model-based Fuzzer. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security - CCS ‘17. ACM Press, New York, pp. 2345–2358
Intel Corporation (2019) Intel Processor Trace. https://software.intel.com/en-us/blogs/2013/09/18/processor-tracing. Accessed 5 Dec 2019
Jain V, Rawat S, Giuffrida C, Bos H (2018) TIFF: using input type inference to improve fuzzing. In: Proceedings of the 34th Annual Computer Security Applications Conference. ACM, New York, pp. 505–517
Jibesh Patra MP (2016) Learning to fuzz: application-independent fuzz testing with probabilistic, generative models of input data. TU Darmstadt, Department of Computer Science 1:123–129
Karamcheti S, Mann G, Rosenberg D (2018) Adaptive Grey-box fuzz-testing with Thompson sampling. In: Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security - AISec ‘18. ACM Press, New York, pp. 37–47
Lee DD, Seung HS (1999) Learning the parts of objects by nonnegative matrix factorization. Nature 401(6755):788–791
Lee DD, Seung HS (2001) Algorithm for non-negative matrix factorization. Adv Neural Inf Proces Syst:556–562
Lemieux C, Sen K (2018) FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering - ASE 2018. ACM Press, New York, pp. 475–485
Lemieux C, Padhye R, Sen K, Song D (2018) PerfFuzz: Automatically Generating Pathological Inputs. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis - ISSTA 2018. ACM Press, New York, pp. 254–265
Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th international conference on World wide web - WWW ‘10. ACM Press, New York, pp. 1–10
Li Y, Chen B, Chandramohan M, et al (2017) Steelix: Program-state based Binary Fuzzing. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2017. ACM Press, New York, pp. 627–637
Li Y, Ji S, Lv C, et al (2019) V-fuzz: vulnerability-oriented evolutionary fuzzing. arXiv e-prints 1–16
Libpng (2019) http://www.libpng.org/pub/png/libpng.html. Accessed 13 Dec 2019
Lyu C, Ji S, Li Y, et al (2018) SmartSeed: smart seed generation for efficient fuzzing. arXiv e-prints 1–17
Lyu C, Ji S, Zhang C, et al (2019) MOPT: optimized mutation scheduling for Fuzzers. In: Proceedings of the 2019 USENIX Security Symposium. USENIX, Santa Clara, pp. 1–21
McCaffrey J (2019) The Epsilon-Greedy Algorithm. https://jamesmccaffrey.wordpress.com/2017/11/30/the-epsilon-greedy-algorithm/. Accessed 4 Oct 2019
Peng H, Shoshitaishvili Y, Payer M (2018) T-fuzz: fuzzing by program transformation. In: Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP). IEEE, San Francisco, CA, USA, pp. 697–710
Rajpal M, Blum W, Singh R (2017) Not all bytes are equal: neural byte sieve for fuzzing. arXiv e-prints 1–10
Rawat S, Jain V, Kumar A, et al (2017) VUzzer: application-aware evolutionary fuzzing. In: Proceedings 2017 Network and Distributed System Security Symposium. Internet Society, Reston, VA, pp. 1–13
Rebert A, Cha SK, Avgerinos T, et al (2014) Optimizing seed selection for fuzzing. In: Proceedings of the 2014 USENIX Security Symposium. USENIX, San Diego, pp. 861–875
Schumilo S, Aschermann C, Gawlik R, et al (2017) kAFL: hardware-assisted feedback fuzzing for OS kernels. In: Proceedings of the 2017 USENIX Security Symposium. USENIX, Vancouver, pp. 167–182
Serebryany K (2019) LibFuzzer. http://llvm.org/docs/LibFuzzer.html. Accessed 4 Dec 2019
She D, Pei K, Epstein D, et al (2018) NEUZZ: Efficient Fuzzing with Neural Program Smoothing. arXiv e-prints
Slivkins A (2019) Introduction to multi-armed bandits. Found Trends® Mach learn 12:1–286. https://doi.org/10.1561/2200000068
Spieker H, Gotlieb A, Marijan D, Mossige M (2017) Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration. In: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis - ISSTA 2017. ACM Press, New York, pp. 12–22
Stephens N, Grosen J, Salls C, et al (2016) Driller: augmenting fuzzing through selective symbolic execution. In: Proceedings 2016 Network and Distributed System Security Symposium. Internet Society, Reston, VA, pp. 21–24
Wang J, Chen B, Wei L, Liu Y (2017) Skyfire: data-driven seed generation for fuzzing. In: Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP). IEEE, San Jose, pp. 579–594
Wang J, Duan Y, Song W, et al (2019a) Be sensitive and collaborative: analyzing impact of coverage metrics in Greybox fuzzing. In: Proceedings of the 2019 International Symposium on Research in Attacks, Intrusions and Defenses. USENIX, Beijing, pp. 1–15
Wang Y, Wu Z, Wei Q, Wang Q (2019b) NeuFuzz: Efficient Fuzzing with Deep Neural Network. IEEE Access 7:36340–36352. https://doi.org/10.1109/ACCESS.2019.2903291
Watkins CJCH, Dayan P (1992) Technical note: Q-learning. Mach Learn 8:279–292. https://doi.org/10.1023/A:1022676722315
Woo M, Cha SK, Gottlieb S, Brumley D (2013) Scheduling Black-box Mutational Fuzzing. In: Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security - CCS ‘13. ACM Press, New York, pp. 511–522
Xu W, Kashyap S, Min C, Kim T (2017) Designing New Operating Primitives to Improve Fuzzing Performance. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security - CCS ‘17. ACM Press, New York, pp. 2313–2328
You W, Zong P, Chen K, et al (2017) SemFuzz: Semantics-based Automatic Generation of Proof-of-Concept Exploits. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security - CCS ‘17. ACM Press, New York, pp. 2139–2154
You W, Wang X, Ma S, et al (2019) ProFuzzer: on-the-fly input type probing for better zero-day vulnerability discovery. In: Proceedings of the 2019 IEEE Symposium Security Privacy, IEEE, San Francisco, pp. 1–18
Yun I, Lee S, Xu M, et al (2018) QSYM: practical Concolic execution engine tailored for hybrid fuzzing. In: Proceedings of the 2018 USENIX Security Symposium. USENIX, Baltimore, pp. 745–761
Zalewski M (2019) American Fuzzy Lop. http://lcamtuf.coredump.cx/afl/. Accessed 5 April 2019
Zhang G, Zhou X, Luo Y et al (2018) PTfuzz: guided fuzzing with processor trace feedback. IEEE Access 6:37302–37313. https://doi.org/10.1109/ACCESS.2018.2851237
Zhao L, Duan Y, Yin H, Xuan J (2019) Send hardest problems my way: probabilistic path prioritization for hybrid fuzzing. In: Proceedings 2019 Network and Distributed System Security Symposium. The Internet Society, San Diego, pp. 1–15
Acknowledgments
We would like to thank all practitioners who participated in the focus group discussions, as well as we thank the anonymous reviewers for their constructive comments to improve the paper. This work was supported by the National Key R&D Program of China under Grant 2016QY07X1404.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Dan Hao
Rights and permissions
About this article
Cite this article
Wang, X., Hu, C., Ma, R. et al. CMFuzz: context-aware adaptive mutation for fuzzers. Empir Software Eng 26, 10 (2021). https://doi.org/10.1007/s10664-020-09927-3
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-020-09927-3