Skip to main content
Log in

CMFuzz: context-aware adaptive mutation for fuzzers

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Mutation-based fuzzing is a simple yet effective technique to discover bugs and security vulnerabilities in software. Given a set of well-formed initial seeds, mutation-based fuzzers continually generate interesting seeds by applying specific mutation strategy in order to maximize code coverage or the number of unique bugs explored at any point-in-time. However, existing fuzzers remain limited in the paths it could cover since it simply follows a uniform distribution to choose mutation operators. In this paper, we proposed a novel context-aware adaptive mutation scheme, namely CMFuzz, which utilizes a contextual bandit algorithm LinUCB to effectively choose optimal mutation operators for various seed files. To this end, CMFuzz dynamically extracts and encodes file characteristics, which allows mutation-based fuzzers to perform context-aware mutation. We apply this scheme on top of several state-of-the-art fuzzers, i.e., PTfuzz, AFL, and AFLFast, and implement CMFuzz-PT, CMFuzz-AFL, and CMFuzz-AFLFast, respectively. We conduct evaluation on 12 real-world open source applications and LAVA-M dataset against their counterparts. Extensive evaluations demonstrate that CMFuzz-based fuzzers achieve higher code coverage and find more crashes at a faster rate than their counterparts on most cases. Furthermore, we also utilize other mainstream bandit algorithms, e.g., Thompson Sample and epsilon-greedy, and implement Thompson-PT and Greedy-PT based on PTfuzz to examine the performance of proposed model. CMFuzz-PT significantly outperforms Thompson-PT especially in terms of unique crashes and paths, i.e., found 1.79× unique crashes and 1.29× unique paths on average. Compared to Greedy-PT, our approach still increases the amount of unique crashes and paths by 1.11× and 1.05×, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Adobe (2019) A Basic Distributed Fuzzing Framework for FOE.https://blogs.adobe.com/security/2012/05/a-basic-distributed-fuzzing-framework-for-foe. html. Accessed 5 Dec 2019

  • Agrawal S, Goyal N (2011) Analysis of Thompson sampling for the multi-armed bandit problem. J Mach Learn Res 23:1–26

    Google Scholar 

  • Aschermann C, Schumilo S, Blazytko T et al (2019) REDQUEEN: fuzzing with input-to-state correspondence. In: Proceedings of the 2019 network and distributed system security symposium. The Internet Society, San Diego, pp 1–15

    Google Scholar 

  • Böhme M, Pham V-T, Roychoudhury A (2016) Coverage-based Greybox Fuzzing as Markov Chain. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security - CCS’16. ACM Press, New York, pp. 1032–1043

  • Böhme M, Pham V-T, Nguyen M-D, Roychoudhury A (2017) Directed Greybox Fuzzing. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security - CCS ‘17. ACM Press, New York, pp. 2329–2344

  • Böttinger K, Godefroid P, Singh R (2018) Deep Reinforcement Fuzzing. In: Proceedings of the 2018 IEEE Security and Privacy Workshops (SPW). IEEE, San Francisco, pp. 116–122

  • Cha SK, Woo M, Brumley D (2015) Program-adaptive mutational fuzzing. In: Proceedings of the 2015 IEEE Symposium on Security and Privacy. IEEE, San Jose, pp. 725–741

  • Chen P, Chen H (2018) Angora: efficient fuzzing by principled search. In: Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP). IEEE, San Francisco, pp 711–725

    Chapter  Google Scholar 

  • Chen H, Xue Y, Li Y, et al (2018) Hawkeye: Towards a Desired Directed Grey-box Fuzzer. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. ACM, New York, pp. 2095–2108

  • Dhillon IS, Sra S (2005) Generalized Nonnegative Matrix Approximations with Bregman Divergences. Advances in Neural Information Processing Systems 283–290

  • Dolan-Gavitt B, Hulin P, Kirda E, et al (2016) LAVA: large-scale automated vulnerability addition. In: Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP). IEEE, San Jose, pp. 110–121

  • Drozd W, Wagner MD (2018) FuzzerGym: a competitive framework for fuzzing and learning. arXiv e-prints

  • Gan S, Zhang C, Qin X, et al (2018) CollAFL: path sensitive fuzzing. In: Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP). IEEE, San Francisco, pp. 679–696

  • Gan S, Zhang C, Chen P, et al (2020) GREYONE: Data Flow Sensitive Fuzzing. Proceedings of the 2020 USENIX Security Symposium. USENIX, Boston, pp. 1–18

  • Godefroid P, Levin MY, Molnar D (2012) SAGE: Whitebox fuzzing for security testing. Commun ACM 55:40–44. https://doi.org/10.1145/2093548.2093564

    Article  Google Scholar 

  • Godefroid P, Peleg H, Singh R (2017) Learn&Fuzz: machine learning for input fuzzing. In: Proceedings of the 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, Urbana-Champaign, pp. 50–59

  • Google (2019a) ClusterFuzz. https://google.github.io/clusterfuzz/.

  • Google (2019b) OSS-Fuzz. https://google.github.io/oss-fuzz/.

  • Han H, Cha SK (2017) IMF: Inferred Model-based Fuzzer. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security - CCS ‘17. ACM Press, New York, pp. 2345–2358

  • Intel Corporation (2019) Intel Processor Trace. https://software.intel.com/en-us/blogs/2013/09/18/processor-tracing. Accessed 5 Dec 2019

  • Jain V, Rawat S, Giuffrida C, Bos H (2018) TIFF: using input type inference to improve fuzzing. In: Proceedings of the 34th Annual Computer Security Applications Conference. ACM, New York, pp. 505–517

  • Jibesh Patra MP (2016) Learning to fuzz: application-independent fuzz testing with probabilistic, generative models of input data. TU Darmstadt, Department of Computer Science 1:123–129

    Google Scholar 

  • Karamcheti S, Mann G, Rosenberg D (2018) Adaptive Grey-box fuzz-testing with Thompson sampling. In: Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security - AISec ‘18. ACM Press, New York, pp. 37–47

  • Lee DD, Seung HS (1999) Learning the parts of objects by nonnegative matrix factorization. Nature 401(6755):788–791

    Article  Google Scholar 

  • Lee DD, Seung HS (2001) Algorithm for non-negative matrix factorization. Adv Neural Inf Proces Syst:556–562

  • Lemieux C, Sen K (2018) FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering - ASE 2018. ACM Press, New York, pp. 475–485

  • Lemieux C, Padhye R, Sen K, Song D (2018) PerfFuzz: Automatically Generating Pathological Inputs. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis - ISSTA 2018. ACM Press, New York, pp. 254–265

  • Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th international conference on World wide web - WWW ‘10. ACM Press, New York, pp. 1–10

  • Li Y, Chen B, Chandramohan M, et al (2017) Steelix: Program-state based Binary Fuzzing. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2017. ACM Press, New York, pp. 627–637

  • Li Y, Ji S, Lv C, et al (2019) V-fuzz: vulnerability-oriented evolutionary fuzzing. arXiv e-prints 1–16

  • Libpng (2019) http://www.libpng.org/pub/png/libpng.html. Accessed 13 Dec 2019

  • Lyu C, Ji S, Li Y, et al (2018) SmartSeed: smart seed generation for efficient fuzzing. arXiv e-prints 1–17

  • Lyu C, Ji S, Zhang C, et al (2019) MOPT: optimized mutation scheduling for Fuzzers. In: Proceedings of the 2019 USENIX Security Symposium. USENIX, Santa Clara, pp. 1–21

  • McCaffrey J (2019) The Epsilon-Greedy Algorithm. https://jamesmccaffrey.wordpress.com/2017/11/30/the-epsilon-greedy-algorithm/. Accessed 4 Oct 2019

  • Peng H, Shoshitaishvili Y, Payer M (2018) T-fuzz: fuzzing by program transformation. In: Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP). IEEE, San Francisco, CA, USA, pp. 697–710

  • Rajpal M, Blum W, Singh R (2017) Not all bytes are equal: neural byte sieve for fuzzing. arXiv e-prints 1–10

  • Rawat S, Jain V, Kumar A, et al (2017) VUzzer: application-aware evolutionary fuzzing. In: Proceedings 2017 Network and Distributed System Security Symposium. Internet Society, Reston, VA, pp. 1–13

  • Rebert A, Cha SK, Avgerinos T, et al (2014) Optimizing seed selection for fuzzing. In: Proceedings of the 2014 USENIX Security Symposium. USENIX, San Diego, pp. 861–875

  • Schumilo S, Aschermann C, Gawlik R, et al (2017) kAFL: hardware-assisted feedback fuzzing for OS kernels. In: Proceedings of the 2017 USENIX Security Symposium. USENIX, Vancouver, pp. 167–182

  • Serebryany K (2019) LibFuzzer. http://llvm.org/docs/LibFuzzer.html. Accessed 4 Dec 2019

  • She D, Pei K, Epstein D, et al (2018) NEUZZ: Efficient Fuzzing with Neural Program Smoothing. arXiv e-prints

  • Slivkins A (2019) Introduction to multi-armed bandits. Found Trends® Mach learn 12:1–286. https://doi.org/10.1561/2200000068

  • Spieker H, Gotlieb A, Marijan D, Mossige M (2017) Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration. In: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis - ISSTA 2017. ACM Press, New York, pp. 12–22

  • Stephens N, Grosen J, Salls C, et al (2016) Driller: augmenting fuzzing through selective symbolic execution. In: Proceedings 2016 Network and Distributed System Security Symposium. Internet Society, Reston, VA, pp. 21–24

  • Wang J, Chen B, Wei L, Liu Y (2017) Skyfire: data-driven seed generation for fuzzing. In: Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP). IEEE, San Jose, pp. 579–594

  • Wang J, Duan Y, Song W, et al (2019a) Be sensitive and collaborative: analyzing impact of coverage metrics in Greybox fuzzing. In: Proceedings of the 2019 International Symposium on Research in Attacks, Intrusions and Defenses. USENIX, Beijing, pp. 1–15

  • Wang Y, Wu Z, Wei Q, Wang Q (2019b) NeuFuzz: Efficient Fuzzing with Deep Neural Network. IEEE Access 7:36340–36352. https://doi.org/10.1109/ACCESS.2019.2903291

  • Watkins CJCH, Dayan P (1992) Technical note: Q-learning. Mach Learn 8:279–292. https://doi.org/10.1023/A:1022676722315

    Article  MATH  Google Scholar 

  • Woo M, Cha SK, Gottlieb S, Brumley D (2013) Scheduling Black-box Mutational Fuzzing. In: Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security - CCS ‘13. ACM Press, New York, pp. 511–522

  • Xu W, Kashyap S, Min C, Kim T (2017) Designing New Operating Primitives to Improve Fuzzing Performance. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security - CCS ‘17. ACM Press, New York, pp. 2313–2328

  • You W, Zong P, Chen K, et al (2017) SemFuzz: Semantics-based Automatic Generation of Proof-of-Concept Exploits. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security - CCS ‘17. ACM Press, New York, pp. 2139–2154

  • You W, Wang X, Ma S, et al (2019) ProFuzzer: on-the-fly input type probing for better zero-day vulnerability discovery. In: Proceedings of the 2019 IEEE Symposium Security Privacy, IEEE, San Francisco, pp. 1–18

  • Yun I, Lee S, Xu M, et al (2018) QSYM: practical Concolic execution engine tailored for hybrid fuzzing. In: Proceedings of the 2018 USENIX Security Symposium. USENIX, Baltimore, pp. 745–761

  • Zalewski M (2019) American Fuzzy Lop. http://lcamtuf.coredump.cx/afl/. Accessed 5 April 2019

  • Zhang G, Zhou X, Luo Y et al (2018) PTfuzz: guided fuzzing with processor trace feedback. IEEE Access 6:37302–37313. https://doi.org/10.1109/ACCESS.2018.2851237

    Article  Google Scholar 

  • Zhao L, Duan Y, Yin H, Xuan J (2019) Send hardest problems my way: probabilistic path prioritization for hybrid fuzzing. In: Proceedings 2019 Network and Distributed System Security Symposium. The Internet Society, San Diego, pp. 1–15

Download references

Acknowledgments

We would like to thank all practitioners who participated in the focus group discussions, as well as we thank the anonymous reviewers for their constructive comments to improve the paper. This work was supported by the National Key R&D Program of China under Grant 2016QY07X1404.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Ma.

Additional information

Communicated by: Dan Hao

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Hu, C., Ma, R. et al. CMFuzz: context-aware adaptive mutation for fuzzers. Empir Software Eng 26, 10 (2021). https://doi.org/10.1007/s10664-020-09927-3

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-020-09927-3

Keywords

Navigation