Skip to main content
Log in

How verified (or tested) is my code? Falsification-driven verification and testing

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Formal verification has advanced to the point that developers can verify the correctness of small, critical modules. Unfortunately, despite considerable efforts, determining if a “verification” verifies what the author intends is still difficult. Previous approaches are difficult to understand and often limited in applicability. Developers need verification coverage in terms of the software they are verifying, not model checking diagnostics. We propose a methodology to allow developers to determine (and correct) what it is that they have verified, and tools to support that methodology. Our basic approach is based on a novel variation of mutation analysis and the idea of verification driven by falsification. We use the CBMC model checker to show that this approach is applicable not only to simple data structures and sorting routines, and verification of a routine in Mozilla’s JavaScript engine, but to understanding an ongoing effort to verify the Linux kernel read-copy-update mechanism. Moreover, we show that despite the probabilistic nature of random testing and the tendency to incompleteness of testing as opposed to verification, the same techniques, with suitable modifications, apply to automated test generation as well as to formal verification. In essence, it is the number of surviving mutants that drives the scalability of our methods, not the underlying method for detecting faults in a program. From the point of view of a Popperian analysis where an unkilled mutant is a weakness (in terms of its falsifiability) in a “scientific theory” of program behavior, it is only the number of weaknesses to be examined by a user that is important.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. By a harness we mean a program that defines an environment and the form of valid tests, and provides correctness properties.

  2. In fact, that actual code is incorrect, with an access a[i] that does not properly use short circuiting logical operators to protect array bounds; CBMC detected this, and we fixed it for this paper.

  3. In our own practice, the most common way of setting it is to guess a bound and see if the resulting problem is too large for the available resources.

  4. There is one noted exception in Sect. 4.4.

  5. We show the output of the print statements, not the full CBMC trace: this is what a developer will examine first.

  6. In fact, if we choose a val to check before we assign to ref, we could completely dispense with storing ref at all.

  7. See the issues labeled with TSTL on the pyfakefs GitHub issue tracker for a history of the testing effort.

  8. Our terminology here is not quite Popper’s, which is somewhat difficult to follow without a lengthy introduction to his classification of statements.

  9. In fact, Kaner, Bach, and Pettichord explicitly mention Popper, though only in the context of using tests to refute conjectures about the correctness of software, not in the context of attempting to refute the testing effort itself.

  10. Note that we use a model checking approach that already guarantees non-spurious counterexamples, and provides bounded rather than full verification.

References

  • Ahmed, I., Gopinath, R., Brindescu, C., Groce, A., Jensen, C.: Can testedness be effectively measured? In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, pp. 547–558. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2950290.2950324

  • Ahmed, I., Jensen, C., Groce, A., McKenney, P.E.: Applying mutation analysis on kernel test suites: an experience report. In: International Workshop on Mutation Analysis, pp. 110–115 (2017)

  • Aichernig, B.K.: Model-based mutation testing of reactive systems. In: Theories of Programming and Formal Methods, pp. 23–36. Springer (2013)

  • Alipour, M.A., Groce, A., Zhang, C., Sanadaji, A., Caushik, G.: Finding model-checkable needles in large source code haystacks: Modular bug-finding via static analysis and dynamic invariant discovery. In: International Workshop on Constraints in Formal Verification (2013)

  • Andrews, J.H., Briand, L.C., Labiche, Y.: Is mutation an appropriate tool for testing experiments? In: International Conference on Software Engineering, pp. 402–411 (2005)

  • Andrews, J.H., Groce, A., Weston, M., Xu, R.G.: Random test run length and effectiveness. In: Automated Software Engineering, pp. 19–28 (2008)

  • Andrews, J.H., Briand, L.C., Labiche, Y., Namin, A.S.: Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Trans. Softw. Eng. 32(8), 608 (2006)

    Article  Google Scholar 

  • Arcuri, A., Briand, L.: A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Softw. Test. Verif. Reliab. 24(3), 219–250 (2014)

    Article  Google Scholar 

  • Auerbach, G., Copty, F., Paruthi, V.: Formal verification of arbiters using property strengthening and underapproximations. In: Formal Methods in Computer-Aided Design, pp. 21–24 (2010)

  • Ball, T., Kupferman, O., Yorsh, G.: Abstraction for falsification. In: Computer Aided Verification, pp. 67–81 (2005)

    Chapter  Google Scholar 

  • Barr, E.T., Harman, M., McMinn, P., Shahbaz, M., Yoo, S.: The oracle problem in software testing: a survey. IEEE Trans. Softw. Eng. 41(5), 507–525 (2015)

    Article  Google Scholar 

  • Bentley, J.: Programming pearls: writing correct programs. Commun. ACM 26(12), 1040–1045 (1983)

    Article  Google Scholar 

  • Black, P.E., Okun, V., Yesha, Y.: Mutation of model checker specifications for test generation and evaluation. Mutation 2000, 14–20 (2000)

    Google Scholar 

  • Bloch, J.: Extra, extra - read all about it: nearly all binary searches and mergesorts are broken. http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html (2006)

  • bremen, mrbean, jmcgeheeiv, et al.: pyfakefs implements a fake file system that mocks the Python file system modules. https://github.com/jmcgeheeiv/pyfakefs (2011)

  • Budd, T.A., DeMillo, R.A., Lipton, R.J., Sayward, F.G.: Theoretical and empirical studies on using program mutation to test the functional correctness of programs. In: Principles of Programming Languages, pp. 220–233. ACM (1980)

  • Budd, T.A.: Mutation analysis of program test data. Ph.D. thesis, Yale University, New Haven, CT, USA (1980)

  • Budd, T.A., Lipton, R.J., DeMillo, R.A., Sayward, F.G.: Mutation Analysis. Yale University, Department of Computer Science, New Haven (1979)

    Book  Google Scholar 

  • Buxton, J.N., Randell, B.: Report of a conference sponsored by the NATO science committee. In: NATO Software Engineering Conference, vol. 1969 (1969)

  • Chen, Y., Groce, A., Zhang, C., Wong, W.K., Fern, X., Eide, E., Regehr, J.: Taming compiler fuzzers. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 197–208 (2013)

  • Chockler, H., Gurfinkel, A., Strichman, O.: Beyond vacuity: Towards the strongest passing formula. In: Proceedings of the 2008 International Conference on Formal Methods in Computer-Aided Design, pp. 24:1–24:8 (2008)

  • Chockler, H., Kupferman, O., Kurshan, R.P., Vardi, M.Y.: A practical approach to coverage in model checking. In: Computer Aided Verification, pp. 66–78 (2001)

    Chapter  Google Scholar 

  • Chockler, H., Kroening, D., Purandare, M.: Computing mutation coverage in interpolation-based model checking. IEEE Trans. CAD Integr. Circuits Syst. 31(5), 765–778 (2012)

    Article  Google Scholar 

  • Clarke, E., Grumberg, O., McMillan, K., Zhao, X.: Efficient generation of counterexamples and witnesses in symbolic model checking. In: Design Automation Conference, pp. 427–432 (1995)

  • Clarke, E.M., Grumberg, O., Peled, D.: Model Checking. MIT Press, Cambridge (2000)

    Google Scholar 

  • Cuoq, P., Monate, B., Pacalet, A., Prevosto, V., Regehr, J., Yakobowski, B., Yang, X.: Testing static analyzers with randomly generated programs. In: NASA Formal Methods Symposium, pp. 120–125 (2012)

    Google Scholar 

  • Daran, M., Thévenod-Fosse, P.: Software error analysis: A real case study involving real faults and mutations. In: ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 158–171. ACM (1996)

  • de Moura, L.M., Bjørner, N.: Z3: an efficient SMT solver. In: Tools and Algorithms for the Construction and Analysis of Systems, pp. 337–340 (2008)

  • DeMillo, R.A., Lipton, R.J., Sayward, F.G.: Hints on test data selection: help for the practicing programmer. Computer 4(11), 34 (1978)

    Article  Google Scholar 

  • Desnoyers, M.: [RFC git tree] userspace RCU (urcu) for Linux (2009). http://urcu.so

  • Desnoyers, M., McKenney, P.E., Stern, A., Dagenais, M.R., Walpole, J.: User-level implementations of read-copy update. IEEE Trans. Parallel Distrib. Syst. 23, 375–382 (2012). https://doi.org/10.1109/TPDS.2011.159

    Article  Google Scholar 

  • Dijkstra, E.W.: A Discipline of Programming. Prentice-Hall, Englewood Cliffs, NJ (1976)

    MATH  Google Scholar 

  • Een, N., Sorensson, N.: An extensible SAT-solver. In: Symposium on the Theory and Applications of Satisfiability Testing (SAT), pp. 502–518 (2003)

    Chapter  Google Scholar 

  • Ghassabani, E., Gacek, A., Whalen, M.W., Heimdahl, M.P.E., Wagner, L.G.: Proof-based coverage metrics for formal verification. In: IEEE/ACM International Conference on Automated Software Engineering, pp. 194–199 (2017a)

  • Ghassabani, E., Gacek, A., Whalen, M.W.: Efficient generation of inductive validity cores for safety properties. In: ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 314–325 (2016)

  • Ghassabani, E., Whalen, M.W., Gacek, A.: Efficient generation of all minimal inductive validity cores. In: FMCAD, pp. 31–38 (2017b)

  • Gligoric, M., Groce, A., Zhang, C., Sharma, R., Alipour, A., Marinov, D.: Comparing non-adequate test suites using coverage criteria. In: International Symposium on Software Testing and Analysis, pp. 302–313 (2013)

  • Gligoric, M., Groce, A., Zhang, C., Sharma, R., Alipour, A., Marinov, D.: Guidelines for coverage-based comparisons of non-adequate test suites. ACM Trans. Softw. Eng. Methodol.(accepted for publication)

  • Gopinath, R., Jensen, C., Groce, A.: Code coverage for suite evaluation by developers. In: International Conference on Software Engineering, pp. 72–82 (2014)

  • Groce, A., Ahmed, I., Jensen, C., McKenney, P.E.: How verified is my code? falsification-driven verification. In: 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9–13, 2015, pp. 737–748 (2015). https://doi.org/10.1109/ASE.2015.40

  • Groce, A., Alipour, M.A., Gopinath, R.: Coverage and its discontents. In: Onward! Essays, pp. 255–268 (2014)

  • Groce, A., Erwig, M.: Finding common ground: choose, assert, and assume. In: Workshop on Dynamic Analysis, pp. 12–17 (2012)

  • Groce, A., Holmes, J., Marinov, D., Shi, A., Zhang, L.: An extensible, regular-expression-based tool for multi-language mutant generation. In: International Conference on Software Engineering (2018)

  • Groce, A., Holzmann, G., Joshi, R., Xu, R.G.: Putting flight software through the paces with testing, model checking, and constraint-solving. In: Workshop on Constraints in Formal Verification, pp. 1–15 (2008)

  • Groce, A., Holzmann, G., Joshi, R.: Randomized differential testing as a prelude to formal verification. In: International Conference on Software Engineering, pp. 621–631 (2007)

  • Groce, A., Joshi, R.: Exploiting traces in program analysis. In: Tools and Algorithms for the Construction and Analysis of Systems, pp. 379–393 (2006)

    Chapter  Google Scholar 

  • Groce, A., Joshi, R.: Random testing and model checking: Building a common framework for nondeterministic exploration. In: Workshop on Dynamic Analysis, pp. 22–28 (2008)

  • Groce, A., Pinto, J., Azimi, P., Mittal, P., Holmes, J., Kellar, K.: TSTL: the template scripting testing language. https://github.com/agroce/tstl (2015b)

  • Groce, A., Pinto, J., Azimi, P., Mittal, P.: TSTL: a language and tool for testing (demo). In: ACM International Symposium on Software Testing and Analysis, pp. 414–417 (2015a)

  • Groce, A., Pinto, J.: A little language for testing. In: NASA Formal Methods Symposium, pp. 204–218 (2015)

    Google Scholar 

  • Groce, A.: Error explanation with distance metrics. In: Tools and Algorithms for the Construction and Analysis of Systems, pp. 108–122 (2004)

    Chapter  Google Scholar 

  • Groce, A.: Quickly testing the tester via path coverage. In: Workshop on Dynamic Analysis (2009)

  • Groce, A., Kroening, D.: Making the most of BMC counter examples. Electron. Notes Theor. Comput. Sci. 119(2), 67–81 (2005)

    Article  Google Scholar 

  • Guniguntala, D., McKenney, P.E., Triplett, J., Walpole, J.: The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with Linux. IBM Syst. J. 47(2), 221–236 (2008)

    Article  Google Scholar 

  • Holmes, J., Groce, A., Pinto, J., Mittal, P., Azimi, P., Kellar, K., O’Brien, J.: TSTL: the template scripting testing language. Int. J. Softw. Tools Technol. Transf. (2016). https://doi.org/10.1007/s10009-016-0445-y. (Online first)

    Article  Google Scholar 

  • Horspool, R.N.: Practical fast searching in strings. Softw. Pract. Exp. 10(6), 501–506 (1980)

    Article  Google Scholar 

  • Hoskote, Y., Kam, T., Ho, P.H., Zhao, X.: Coverage estimation for symbolic model checking. In: ACM/IEEE Design Automation Conference, pp. 300–305 (1999)

  • Hume, D.: An Enquiry Concerning Human Understanding. Routledge, London (1748)

    Book  Google Scholar 

  • Just, R., Jalali, D., Inozemtseva, L., Ernst, M.D., Holmes, R., Fraser, G.: Are mutants a valid substitute for real faults in software testing? In: ACM SIGSOFT Symposium on Foundations of Software Engineering, pp. 654–665 (2014)

  • Kaner, C., Bach, J., Pettichord, B.: Lessons Learned in Software Testing: A Context-Driven Approach. Wiley, Hoboken (2001)

    Google Scholar 

  • Kroening, D., Clarke, E.M., Lerda, F.: A tool for checking ANSI-C programs. In: Tools and Algorithms for the Construction and Analysis of Systems, pp. 168–176 (2004)

  • Kroening, D., Strichman, O.: Efficient computation of recurrence diameters. In: Verification, Model Checking, and Abstract Interpretation, pp. 298–309 (2003)

    Google Scholar 

  • Kupferman, O., Li, W., Seshia, S.: A theory of mutations with applications to vacuity, coverage, and fault tolerance. In: Formal Methods in Computer-Aided Design, pp. 1–9 (2008)

  • Lakatos, I.: The role of crucial experiments in science. Stud. His. Philos. Sci. Part A 4(4), 309–325 (1974)

    Article  Google Scholar 

  • Lawlor, R.: quicksort.c. http://www.comp.dit.ie/rlawlor/Alg_DS/sorting/quickSort.c. Referenced April 20 (2015)

  • Lee, T.C., Hsiung, P.A.: Mutation coverage estimation for model checking. In: Automated Technology for Verification and Analysis, pp. 354–368 (2004)

    Chapter  Google Scholar 

  • Lipton, R.J.: Fault diagnosis of computer programs. Carnegie Mellon Univ, Technical Report (1971)

  • Liu, X.: muupi mutation tool. https://github.com/apepkuss/muupi (2016)

  • Mathur, A.P.: Foundations of Software Testing. Addison-Wesley, Boston (2012)

    Google Scholar 

  • Mathur, A.P., Wong, W.E.: An empirical comparison of data flow and mutation-based test adequacy criteria. J. Softw. Test. Verif. Reliab. 4(1), 9–31 (1994)

    Article  Google Scholar 

  • McKeeman, W.: Differential testing for software. Dig. Tech. J. Dig. Equip. Corp. 10(1), 100–107 (1998)

    Google Scholar 

  • McKenney, P.E., Eggemann, D., Randhawa, R.: Improving energy efficiency on asymmetric multiprocessing systems (2013). https://www.usenix.org/system/files/hotpar13-poster8-mckenney.pdf

  • McKenney, P.E., Slingwine, J.D.: Read-copy update: Using execution history to solve concurrency problems. In: Parallel and Distributed Computing and Systems, pp. 509–518. Las Vegas, NV (1998)

  • McKenney, P.E.: RCU Linux usage (2006). Available: http://www.rdrop.com/users/paulmck/RCU/linuxusage.html (Viewed January 14, 2007)

  • McKenney, P.E.: RCU torture test operation. https://www.kernel.org/doc/Documentation/RCU/torture.txt

  • McKenney, P.E.: Re: [PATCH fyi] RCU: the bloatwatch edition (2009). Available: http://lkml.org/lkml/2009/1/14/449 (Viewed January 15, 2009)

  • McKenney, P.E.: Verification challenge 4: Tiny RCU. http://paulmck.livejournal.com/39343.html (2015a)

  • McKenney, P.E.: Verification challenge 5: Uses of RCU. http://paulmck.livejournal.com/39793.html (2015b)

  • McKenney, P.E.: Structured deferral: synchronization via procrastination. Commun. ACM 56(7), 40–49 (2013). https://doi.org/10.1145/2483852.2483867

    Article  Google Scholar 

  • Murugesan, A., Whalen, M.W., Rungta, N., Tkachuk, O., Person, S., Heimdahl, M.P.E., You, D.: Are we there yet? determining the adequacy of formalized requirements and test suites. In: NASA Formal Methods Symposium, pp. 279–294 (2015)

    Google Scholar 

  • Offutt, A.J., Voas, J.M.: Subsumption of condition coverage techniques by mutation testing. In: Technical Report ISSE-TR-96-01, Information and Software Systems Engineering, George Mason University (1996)

  • Papadakis, M., Jia, Y., Harman, M., Traon, Y.L.: Trivial compiler equivalence: A large scale empirical study of a simple fast and effective equivalent mutant detection technique. In: International Conference on Software Engineering (2015)

  • Popper, K.: The Logic of Scientific Discovery. Routledge, Hutchinson (1959)

    MATH  Google Scholar 

  • Popper, K.: Conjectures and Refutations: The Growth of Scientific Knowledge. Routledge, London (1963)

    Google Scholar 

  • Schuler, D., Zeller, A.: Assessing oracle quality with checked coverage. In: International Conference on Software Testing, Verification and Validation, pp. 90–99 (2011)

  • Schuler, D., Zeller, A.: Checked coverage: an indicator for oracle quality. Softw. Test. Verif. Reliab. 23(7), 531–551 (2013)

    Article  Google Scholar 

  • scvalex: Finding all paths of minimum length to a node using dijkstras algorithm. https://compprog.wordpress.com/2008/01/17/finding-all-paths-of-minimum-length-to-a-node-using-dijkstras-algorithm/ (2008)

  • Stout, R.: If Death Ever Slept. Viking (1957)

  • Strichman, O., Godlin, B.: Regression verification-a practical way to verify programs. Verified Software: Theories, Tools, Experiments pp. 496–501 (2008)

    Chapter  Google Scholar 

  • Tassarotti, J., Dreyer, D., Vafeiadis, V.: Verifying read-copy-update in a logic for weak memory. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2015). (To appear)

  • Tip, F.: A survey of program slicing techniques. J. Program. Lang. 3, 121–189 (1995)

    Google Scholar 

  • visar: [SOLVED] doubly linked list insertion sort in C. http://www.linuxquestions.org/questions/programming-9/doubly-linked-list-insertion-sort-in-c-4175415860/ (2012)

  • Visser, W., Havelund, K., Brat, G., Park, S., Lerda, F.: Model checking programs. Autom. Softw. Eng. 10(2), 203–232 (2003)

    Article  Google Scholar 

  • Yang, X., Chen, Y., Eide, E., Regehr, J.: Finding and understanding bugs in C compilers. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 283–294 (2011)

  • Zhang, X., Gupta, R., Zhang, Y.: Precise dynamic slicing algorithms. In: International Conference on Software Engineering, pp. 319–329 (2003)

Download references

Acknowledgements

A portion of this work was funded by NSF Grants CCF-1217824 and CCF-1054786.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alex Groce.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Groce, A., Ahmed, I., Jensen, C. et al. How verified (or tested) is my code? Falsification-driven verification and testing. Autom Softw Eng 25, 917–960 (2018). https://doi.org/10.1007/s10515-018-0240-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-018-0240-y

Keywords

Navigation