Abstract
Formal verification has advanced to the point that developers can verify the correctness of small, critical modules. Unfortunately, despite considerable efforts, determining if a “verification” verifies what the author intends is still difficult. Previous approaches are difficult to understand and often limited in applicability. Developers need verification coverage in terms of the software they are verifying, not model checking diagnostics. We propose a methodology to allow developers to determine (and correct) what it is that they have verified, and tools to support that methodology. Our basic approach is based on a novel variation of mutation analysis and the idea of verification driven by falsification. We use the CBMC model checker to show that this approach is applicable not only to simple data structures and sorting routines, and verification of a routine in Mozilla’s JavaScript engine, but to understanding an ongoing effort to verify the Linux kernel read-copy-update mechanism. Moreover, we show that despite the probabilistic nature of random testing and the tendency to incompleteness of testing as opposed to verification, the same techniques, with suitable modifications, apply to automated test generation as well as to formal verification. In essence, it is the number of surviving mutants that drives the scalability of our methods, not the underlying method for detecting faults in a program. From the point of view of a Popperian analysis where an unkilled mutant is a weakness (in terms of its falsifiability) in a “scientific theory” of program behavior, it is only the number of weaknesses to be examined by a user that is important.
Similar content being viewed by others
Notes
By a harness we mean a program that defines an environment and the form of valid tests, and provides correctness properties.
In fact, that actual code is incorrect, with an access a[i] that does not properly use short circuiting logical operators to protect array bounds; CBMC detected this, and we fixed it for this paper.
In our own practice, the most common way of setting it is to guess a bound and see if the resulting problem is too large for the available resources.
There is one noted exception in Sect. 4.4.
We show the output of the print statements, not the full CBMC trace: this is what a developer will examine first.
In fact, if we choose a val to check before we assign to ref, we could completely dispense with storing ref at all.
See the issues labeled with TSTL on the pyfakefs GitHub issue tracker for a history of the testing effort.
Our terminology here is not quite Popper’s, which is somewhat difficult to follow without a lengthy introduction to his classification of statements.
In fact, Kaner, Bach, and Pettichord explicitly mention Popper, though only in the context of using tests to refute conjectures about the correctness of software, not in the context of attempting to refute the testing effort itself.
Note that we use a model checking approach that already guarantees non-spurious counterexamples, and provides bounded rather than full verification.
References
Ahmed, I., Gopinath, R., Brindescu, C., Groce, A., Jensen, C.: Can testedness be effectively measured? In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, pp. 547–558. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2950290.2950324
Ahmed, I., Jensen, C., Groce, A., McKenney, P.E.: Applying mutation analysis on kernel test suites: an experience report. In: International Workshop on Mutation Analysis, pp. 110–115 (2017)
Aichernig, B.K.: Model-based mutation testing of reactive systems. In: Theories of Programming and Formal Methods, pp. 23–36. Springer (2013)
Alipour, M.A., Groce, A., Zhang, C., Sanadaji, A., Caushik, G.: Finding model-checkable needles in large source code haystacks: Modular bug-finding via static analysis and dynamic invariant discovery. In: International Workshop on Constraints in Formal Verification (2013)
Andrews, J.H., Briand, L.C., Labiche, Y.: Is mutation an appropriate tool for testing experiments? In: International Conference on Software Engineering, pp. 402–411 (2005)
Andrews, J.H., Groce, A., Weston, M., Xu, R.G.: Random test run length and effectiveness. In: Automated Software Engineering, pp. 19–28 (2008)
Andrews, J.H., Briand, L.C., Labiche, Y., Namin, A.S.: Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Trans. Softw. Eng. 32(8), 608 (2006)
Arcuri, A., Briand, L.: A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Softw. Test. Verif. Reliab. 24(3), 219–250 (2014)
Auerbach, G., Copty, F., Paruthi, V.: Formal verification of arbiters using property strengthening and underapproximations. In: Formal Methods in Computer-Aided Design, pp. 21–24 (2010)
Ball, T., Kupferman, O., Yorsh, G.: Abstraction for falsification. In: Computer Aided Verification, pp. 67–81 (2005)
Barr, E.T., Harman, M., McMinn, P., Shahbaz, M., Yoo, S.: The oracle problem in software testing: a survey. IEEE Trans. Softw. Eng. 41(5), 507–525 (2015)
Bentley, J.: Programming pearls: writing correct programs. Commun. ACM 26(12), 1040–1045 (1983)
Black, P.E., Okun, V., Yesha, Y.: Mutation of model checker specifications for test generation and evaluation. Mutation 2000, 14–20 (2000)
Bloch, J.: Extra, extra - read all about it: nearly all binary searches and mergesorts are broken. http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html (2006)
bremen, mrbean, jmcgeheeiv, et al.: pyfakefs implements a fake file system that mocks the Python file system modules. https://github.com/jmcgeheeiv/pyfakefs (2011)
Budd, T.A., DeMillo, R.A., Lipton, R.J., Sayward, F.G.: Theoretical and empirical studies on using program mutation to test the functional correctness of programs. In: Principles of Programming Languages, pp. 220–233. ACM (1980)
Budd, T.A.: Mutation analysis of program test data. Ph.D. thesis, Yale University, New Haven, CT, USA (1980)
Budd, T.A., Lipton, R.J., DeMillo, R.A., Sayward, F.G.: Mutation Analysis. Yale University, Department of Computer Science, New Haven (1979)
Buxton, J.N., Randell, B.: Report of a conference sponsored by the NATO science committee. In: NATO Software Engineering Conference, vol. 1969 (1969)
Chen, Y., Groce, A., Zhang, C., Wong, W.K., Fern, X., Eide, E., Regehr, J.: Taming compiler fuzzers. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 197–208 (2013)
Chockler, H., Gurfinkel, A., Strichman, O.: Beyond vacuity: Towards the strongest passing formula. In: Proceedings of the 2008 International Conference on Formal Methods in Computer-Aided Design, pp. 24:1–24:8 (2008)
Chockler, H., Kupferman, O., Kurshan, R.P., Vardi, M.Y.: A practical approach to coverage in model checking. In: Computer Aided Verification, pp. 66–78 (2001)
Chockler, H., Kroening, D., Purandare, M.: Computing mutation coverage in interpolation-based model checking. IEEE Trans. CAD Integr. Circuits Syst. 31(5), 765–778 (2012)
Clarke, E., Grumberg, O., McMillan, K., Zhao, X.: Efficient generation of counterexamples and witnesses in symbolic model checking. In: Design Automation Conference, pp. 427–432 (1995)
Clarke, E.M., Grumberg, O., Peled, D.: Model Checking. MIT Press, Cambridge (2000)
Cuoq, P., Monate, B., Pacalet, A., Prevosto, V., Regehr, J., Yakobowski, B., Yang, X.: Testing static analyzers with randomly generated programs. In: NASA Formal Methods Symposium, pp. 120–125 (2012)
Daran, M., Thévenod-Fosse, P.: Software error analysis: A real case study involving real faults and mutations. In: ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 158–171. ACM (1996)
de Moura, L.M., Bjørner, N.: Z3: an efficient SMT solver. In: Tools and Algorithms for the Construction and Analysis of Systems, pp. 337–340 (2008)
DeMillo, R.A., Lipton, R.J., Sayward, F.G.: Hints on test data selection: help for the practicing programmer. Computer 4(11), 34 (1978)
Desnoyers, M.: [RFC git tree] userspace RCU (urcu) for Linux (2009). http://urcu.so
Desnoyers, M., McKenney, P.E., Stern, A., Dagenais, M.R., Walpole, J.: User-level implementations of read-copy update. IEEE Trans. Parallel Distrib. Syst. 23, 375–382 (2012). https://doi.org/10.1109/TPDS.2011.159
Dijkstra, E.W.: A Discipline of Programming. Prentice-Hall, Englewood Cliffs, NJ (1976)
Een, N., Sorensson, N.: An extensible SAT-solver. In: Symposium on the Theory and Applications of Satisfiability Testing (SAT), pp. 502–518 (2003)
Ghassabani, E., Gacek, A., Whalen, M.W., Heimdahl, M.P.E., Wagner, L.G.: Proof-based coverage metrics for formal verification. In: IEEE/ACM International Conference on Automated Software Engineering, pp. 194–199 (2017a)
Ghassabani, E., Gacek, A., Whalen, M.W.: Efficient generation of inductive validity cores for safety properties. In: ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 314–325 (2016)
Ghassabani, E., Whalen, M.W., Gacek, A.: Efficient generation of all minimal inductive validity cores. In: FMCAD, pp. 31–38 (2017b)
Gligoric, M., Groce, A., Zhang, C., Sharma, R., Alipour, A., Marinov, D.: Comparing non-adequate test suites using coverage criteria. In: International Symposium on Software Testing and Analysis, pp. 302–313 (2013)
Gligoric, M., Groce, A., Zhang, C., Sharma, R., Alipour, A., Marinov, D.: Guidelines for coverage-based comparisons of non-adequate test suites. ACM Trans. Softw. Eng. Methodol.(accepted for publication)
Gopinath, R., Jensen, C., Groce, A.: Code coverage for suite evaluation by developers. In: International Conference on Software Engineering, pp. 72–82 (2014)
Groce, A., Ahmed, I., Jensen, C., McKenney, P.E.: How verified is my code? falsification-driven verification. In: 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9–13, 2015, pp. 737–748 (2015). https://doi.org/10.1109/ASE.2015.40
Groce, A., Alipour, M.A., Gopinath, R.: Coverage and its discontents. In: Onward! Essays, pp. 255–268 (2014)
Groce, A., Erwig, M.: Finding common ground: choose, assert, and assume. In: Workshop on Dynamic Analysis, pp. 12–17 (2012)
Groce, A., Holmes, J., Marinov, D., Shi, A., Zhang, L.: An extensible, regular-expression-based tool for multi-language mutant generation. In: International Conference on Software Engineering (2018)
Groce, A., Holzmann, G., Joshi, R., Xu, R.G.: Putting flight software through the paces with testing, model checking, and constraint-solving. In: Workshop on Constraints in Formal Verification, pp. 1–15 (2008)
Groce, A., Holzmann, G., Joshi, R.: Randomized differential testing as a prelude to formal verification. In: International Conference on Software Engineering, pp. 621–631 (2007)
Groce, A., Joshi, R.: Exploiting traces in program analysis. In: Tools and Algorithms for the Construction and Analysis of Systems, pp. 379–393 (2006)
Groce, A., Joshi, R.: Random testing and model checking: Building a common framework for nondeterministic exploration. In: Workshop on Dynamic Analysis, pp. 22–28 (2008)
Groce, A., Pinto, J., Azimi, P., Mittal, P., Holmes, J., Kellar, K.: TSTL: the template scripting testing language. https://github.com/agroce/tstl (2015b)
Groce, A., Pinto, J., Azimi, P., Mittal, P.: TSTL: a language and tool for testing (demo). In: ACM International Symposium on Software Testing and Analysis, pp. 414–417 (2015a)
Groce, A., Pinto, J.: A little language for testing. In: NASA Formal Methods Symposium, pp. 204–218 (2015)
Groce, A.: Error explanation with distance metrics. In: Tools and Algorithms for the Construction and Analysis of Systems, pp. 108–122 (2004)
Groce, A.: Quickly testing the tester via path coverage. In: Workshop on Dynamic Analysis (2009)
Groce, A., Kroening, D.: Making the most of BMC counter examples. Electron. Notes Theor. Comput. Sci. 119(2), 67–81 (2005)
Guniguntala, D., McKenney, P.E., Triplett, J., Walpole, J.: The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with Linux. IBM Syst. J. 47(2), 221–236 (2008)
Holmes, J., Groce, A., Pinto, J., Mittal, P., Azimi, P., Kellar, K., O’Brien, J.: TSTL: the template scripting testing language. Int. J. Softw. Tools Technol. Transf. (2016). https://doi.org/10.1007/s10009-016-0445-y. (Online first)
Horspool, R.N.: Practical fast searching in strings. Softw. Pract. Exp. 10(6), 501–506 (1980)
Hoskote, Y., Kam, T., Ho, P.H., Zhao, X.: Coverage estimation for symbolic model checking. In: ACM/IEEE Design Automation Conference, pp. 300–305 (1999)
Hume, D.: An Enquiry Concerning Human Understanding. Routledge, London (1748)
Just, R., Jalali, D., Inozemtseva, L., Ernst, M.D., Holmes, R., Fraser, G.: Are mutants a valid substitute for real faults in software testing? In: ACM SIGSOFT Symposium on Foundations of Software Engineering, pp. 654–665 (2014)
Kaner, C., Bach, J., Pettichord, B.: Lessons Learned in Software Testing: A Context-Driven Approach. Wiley, Hoboken (2001)
Kroening, D., Clarke, E.M., Lerda, F.: A tool for checking ANSI-C programs. In: Tools and Algorithms for the Construction and Analysis of Systems, pp. 168–176 (2004)
Kroening, D., Strichman, O.: Efficient computation of recurrence diameters. In: Verification, Model Checking, and Abstract Interpretation, pp. 298–309 (2003)
Kupferman, O., Li, W., Seshia, S.: A theory of mutations with applications to vacuity, coverage, and fault tolerance. In: Formal Methods in Computer-Aided Design, pp. 1–9 (2008)
Lakatos, I.: The role of crucial experiments in science. Stud. His. Philos. Sci. Part A 4(4), 309–325 (1974)
Lawlor, R.: quicksort.c. http://www.comp.dit.ie/rlawlor/Alg_DS/sorting/quickSort.c. Referenced April 20 (2015)
Lee, T.C., Hsiung, P.A.: Mutation coverage estimation for model checking. In: Automated Technology for Verification and Analysis, pp. 354–368 (2004)
Lipton, R.J.: Fault diagnosis of computer programs. Carnegie Mellon Univ, Technical Report (1971)
Liu, X.: muupi mutation tool. https://github.com/apepkuss/muupi (2016)
Mathur, A.P.: Foundations of Software Testing. Addison-Wesley, Boston (2012)
Mathur, A.P., Wong, W.E.: An empirical comparison of data flow and mutation-based test adequacy criteria. J. Softw. Test. Verif. Reliab. 4(1), 9–31 (1994)
McKeeman, W.: Differential testing for software. Dig. Tech. J. Dig. Equip. Corp. 10(1), 100–107 (1998)
McKenney, P.E., Eggemann, D., Randhawa, R.: Improving energy efficiency on asymmetric multiprocessing systems (2013). https://www.usenix.org/system/files/hotpar13-poster8-mckenney.pdf
McKenney, P.E., Slingwine, J.D.: Read-copy update: Using execution history to solve concurrency problems. In: Parallel and Distributed Computing and Systems, pp. 509–518. Las Vegas, NV (1998)
McKenney, P.E.: RCU Linux usage (2006). Available: http://www.rdrop.com/users/paulmck/RCU/linuxusage.html (Viewed January 14, 2007)
McKenney, P.E.: RCU torture test operation. https://www.kernel.org/doc/Documentation/RCU/torture.txt
McKenney, P.E.: Re: [PATCH fyi] RCU: the bloatwatch edition (2009). Available: http://lkml.org/lkml/2009/1/14/449 (Viewed January 15, 2009)
McKenney, P.E.: Verification challenge 4: Tiny RCU. http://paulmck.livejournal.com/39343.html (2015a)
McKenney, P.E.: Verification challenge 5: Uses of RCU. http://paulmck.livejournal.com/39793.html (2015b)
McKenney, P.E.: Structured deferral: synchronization via procrastination. Commun. ACM 56(7), 40–49 (2013). https://doi.org/10.1145/2483852.2483867
Murugesan, A., Whalen, M.W., Rungta, N., Tkachuk, O., Person, S., Heimdahl, M.P.E., You, D.: Are we there yet? determining the adequacy of formalized requirements and test suites. In: NASA Formal Methods Symposium, pp. 279–294 (2015)
Offutt, A.J., Voas, J.M.: Subsumption of condition coverage techniques by mutation testing. In: Technical Report ISSE-TR-96-01, Information and Software Systems Engineering, George Mason University (1996)
Papadakis, M., Jia, Y., Harman, M., Traon, Y.L.: Trivial compiler equivalence: A large scale empirical study of a simple fast and effective equivalent mutant detection technique. In: International Conference on Software Engineering (2015)
Popper, K.: The Logic of Scientific Discovery. Routledge, Hutchinson (1959)
Popper, K.: Conjectures and Refutations: The Growth of Scientific Knowledge. Routledge, London (1963)
Schuler, D., Zeller, A.: Assessing oracle quality with checked coverage. In: International Conference on Software Testing, Verification and Validation, pp. 90–99 (2011)
Schuler, D., Zeller, A.: Checked coverage: an indicator for oracle quality. Softw. Test. Verif. Reliab. 23(7), 531–551 (2013)
scvalex: Finding all paths of minimum length to a node using dijkstras algorithm. https://compprog.wordpress.com/2008/01/17/finding-all-paths-of-minimum-length-to-a-node-using-dijkstras-algorithm/ (2008)
Stout, R.: If Death Ever Slept. Viking (1957)
Strichman, O., Godlin, B.: Regression verification-a practical way to verify programs. Verified Software: Theories, Tools, Experiments pp. 496–501 (2008)
Tassarotti, J., Dreyer, D., Vafeiadis, V.: Verifying read-copy-update in a logic for weak memory. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2015). (To appear)
Tip, F.: A survey of program slicing techniques. J. Program. Lang. 3, 121–189 (1995)
visar: [SOLVED] doubly linked list insertion sort in C. http://www.linuxquestions.org/questions/programming-9/doubly-linked-list-insertion-sort-in-c-4175415860/ (2012)
Visser, W., Havelund, K., Brat, G., Park, S., Lerda, F.: Model checking programs. Autom. Softw. Eng. 10(2), 203–232 (2003)
Yang, X., Chen, Y., Eide, E., Regehr, J.: Finding and understanding bugs in C compilers. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 283–294 (2011)
Zhang, X., Gupta, R., Zhang, Y.: Precise dynamic slicing algorithms. In: International Conference on Software Engineering, pp. 319–329 (2003)
Acknowledgements
A portion of this work was funded by NSF Grants CCF-1217824 and CCF-1054786.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Groce, A., Ahmed, I., Jensen, C. et al. How verified (or tested) is my code? Falsification-driven verification and testing. Autom Softw Eng 25, 917–960 (2018). https://doi.org/10.1007/s10515-018-0240-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10515-018-0240-y