Skip to main content
Log in

An approach and benchmark to detect behavioral changes of commits in continuous integration

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

When a developer pushes a change to an application’s codebase, a good practice is to have a test case specifying this behavioral change. Thanks to continuous integration (CI), the test is run on subsequent commits to check that they do no introduce a regression for that behavior. In this paper, we propose an approach that detects behavioral changes in commits. As input, it takes a program, its test suite, and a commit. Its output is a set of test methods that capture the behavioral difference between the pre-commit and post-commit versions of the program. We call our approach DCI (Detecting behavioral changes in CI). It works by generating variations of the existing test cases through (i) assertion amplification and (ii) a search-based exploration of the input space. We evaluate our approach on a curated set of 60 commits from 6 open source Java projects. To our knowledge, this is the first ever curated dataset of real-world behavioral changes. Our evaluation shows that DCI is able to generate test methods that detect behavioral changes. Our approach is fully automated and can be integrated into current development processes. The main limitations are that it targets unit tests and works on a relatively small fraction of commits. More specifically, DCI works on commits that have a unit test that already executes the modified code. In practice, from our benchmark projects, we found 15.29% of commits to meet the conditions required by DCI.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Listing 1
Fig. 1
Fig. 2
Listing 2
Listing 3
Listing 4
Listing 5
Listing 6
Listing 7
Listing 8
Listing 9
Listing 10
Listing 11
Listing 12
Listing 13
Listing 14
Listing 15
Listing 16

Similar content being viewed by others

Notes

  1. https://github.com/STAMP-project/dspot-experiments

  2. https://github.com/xwiki/xwiki-commons/commit/7e79f77

  3. by default, nb = 3

  4. https://github.com/STAMP-project/dspot.git

  5. We are aware that behavioral changes can be introduced in other ways, such as modifying dependencies or configuration files (Hilton et al. 2018).

  6. https://github.com/STAMP-project/dspot-experiments/tree/master/src/main/python/april-2019

  7. https://github.com/apache/commons-lang/commit/3fadfdd

  8. https://github.com/apache/commons-io/commit/c6b8a38

  9. https://github.com/apache/commons-lang/commit/f56931c

  10. https://github.com/google/gson/commit/9e6f2ba

  11. https://github.com/jhy/jsoup/commit/e9feec9

  12. https://github.com/spullara/mustache.java/commit/88718bc

  13. https://github.com/xwiki/xwiki-commons/commit/848c984

  14. For a side-by-side comparison, see https://danglotb.github.io/resources/dci/index.html

  15. https://github.com/apache/commons-io/commit/81210eb

  16. https://github.com/apache/commons-lang/commit/e7d16c2

  17. https://github.com/google/gson/commit/44cad04

  18. Interestingly, the number is parsed lazily, only when needed. Consequently, the exception is thrown when invoking the longValue() method and not when invoking parse()

  19. https://github.com/jhy/jsoup/commit/3676b13

  20. https://github.com/spullara/mustache.java/commit/774ae7a

  21. https://github.com/xwiki/xwiki-commons/commit/d3101ae

References

  • Anand S, Pasareanu CS, Visser W (2007) Jpf-se: A symbolic execution extension to java pathfinder 03

  • Böhme M, Roychoudhury A (2014) Corebench: Studying complexity of regression errors. In: Proceedings of the 2014 International Symposium on Software Testing and Analysis. ACM, pp 105–115

  • Cadar C, Dunbar D, Engler D (2008) Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08. USENIX Association, Berkeley, pp 209–224

  • Campos J, Arcuri A, Fraser G, Abreu R (2014) Continuous test generation: Enhancing continuous integration with automated test generation. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ASE ’14. ACM, pp 55–66

  • Danglot B, Vera-Pérez OL, Baudry B, Monperrus M (2019) Automatic test improvement with dspot: a study with ten mature open-source projects. Empirical Software Engineering

  • Daniel B, Jagannath V, Dig D, Marinov D (2009) Reassert: Suggesting repairs for broken unit tests. In: 2009 IEEE/ACM International conference on automated software engineering, pp 433–444

  • Duvall PM, Matyas S, Glover A (2007) Continuous integration: improving software quality and reducing risk. Pearson Education

  • Evans RB, Savoia A (2007) Differential testing: a new approach to change detection. In: The 6th joint meeting on european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering: Companion papers. ACM, pp 549–552

  • Falleri J-R, Morandat F, Blanc X, Martinez M, Monperrus M (2014) Fine-grained and Accurate Source Code Differencing. In: Proceedings of the International Conference on Automated Software Engineering, pp 313–324

  • Fowler M, Foemmel M (2006) Continuous integration. Thought-Works https://www.thoughtworks.com/continuous-integration, pp 122:14

  • Fraser G, Arcuri A (2012) The seed is strong: Seeding strategies in search-based software testing. In: 2012 IEEE fifth international conference on Software testing, verification and validation (ICST). IEEE, pp 121–130

  • Godefroid P, Klarlund N, Sen K (2005) Dart: directed automated random testing. In: ACM Sigplan notices. ACM, vol 40, pp 213–223

  • Groce A, Holzmann G, Joshi R (2007) Randomized differential testing as a prelude to formal verification. In: Proceedings of the 29th international conference on Software Engineering. IEEE Computer Society, pp 621–631

  • Hilton M, Tunnell T, Huang K, Marinov D, Dig D (2016) Usage, costs, and benefits of continuous integration in open-source projects. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016 .ACM, New York, pp 426–437

  • Hilton M, Bell J, Marinov D (2018) A large-scale study of test coverage evolution. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018. ACM, New York, pp 53–63

  • Jin W, Orso A, Xie T (2010) Automated behavioral regression testing. In: 2010 Third international conference on software testing, verification and validation, pp 137–146

  • Kuchta T, Palikareva H, Cadar C (2018) Shadow symbolic execution for testing software patches. ACM Trans Softw Eng Methodol 27(3):10:1–10:32

    Article  Google Scholar 

  • Lahiri S, McMillan K, Hawblitzel C (2013) Differential assertion checking. Technical report

  • Madeiral F, Urli S, Maia M, Monperrus M (2019) Bears An Extensible Java Bug Benchmark for Automatic Program Repair Studies. In: Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’19)

  • Marinescu PD, Cadar C (2013) KATCH: high-coverage testing of software patches. ACM Press, pp 235

  • Menarini M, Yan Y, Griswold WG (2017) Semantics-assisted code review: an efficient tool chain and a user study. In: 2017 32Nd IEEE/ACM international conference on automated software engineering (ASE), pp 554–565

  • Noller Y, Nguyen HL, Tang M, Kehrer T (2018) Shadow symbolic execution with java pathfinder. SIGSOFT Softw. Eng. Notes 42(4):1–5

    Article  Google Scholar 

  • Palikareva H, Kuchta T, Cadar C (2016) Shadow of a doubt: testing for divergences between software versions. In: Proceedings of the 38th International Conference on Software Engineering. ACM, pp 1181–1192

  • Person S, Dwyer MB, Elbaum S, Pǎsǎreanu CS (2008) Differential symbolic execution. In: sProceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, SIGSOFT ’08/FSE-16. ACM, New York, pp 226–237, NY

  • Saff D, Ernst MD (2004) An experimental evaluation of continuous testing during development. In: ACM SIGSOFT Software engineering notes. ACM, vol 29, pp 76–85

  • Spieker H, Gotlieb A, Marijan D, Mossige M (2017) Reinforcement learning for automatic test case prioritization and selection in continuous integration. In: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2017. ACM, New York, pp 12–22

  • Taneja K, Xie T (2008) Diffgen: Automated regression unit-test generation. In: Proceedings of the 2008 23rd IEEE/ACM International Conference on Automated Software Engineering, ASE ’08. IEEE Computer Society, Washington, pp 407–410

  • Tonella P (2004) Evolutionary testing of classes. In: Proceedings of the 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA ’04. ACM, New York, pp 119–128

  • Urli S, Yu Z, Seinturier L, Monperrus M (2018) How to Design a Program Repair Bot? Insights from the Repairnator Project. In: ICSE 2018 - 40Th international conference on software engineering, track software engineering in practice (SEIP), pp 1–10

  • Vera-Pérez OL, Danglot B, Monperrus M, Baudry B (2018) A comprehensive study of pseudo-tested methods. Empirical Software Engineering

  • Voas JM, Miller KW (1995) Software testability: the new verification. IEEE Softw 12(3):17–28

    Article  Google Scholar 

  • Waller J, Ehmke NC, Hasselbring W (2015) Including performance benchmarks into continuous integration to enable devops. SIGSOFT Softw Eng Notes 40(2):1–4

    Article  Google Scholar 

  • Xie T (2006) Augmenting automatically generated unit-test suites with regression oracle checking. In: Thomas D (ed) ECOOP 2006 – Object-Oriented Programming. Springer, Berlin, pp 380–403

  • Yang G, Khurshid S, Person S, Rungta N (2014) Property differencing for incremental checking. In: Proceedings of the 36th International Conference on Software Engineering, ICSE 2014. ACM, New York, pp 1059–1070

  • Yoo S, Harman M (2012) Test data regeneration: Generating new test data from existing test data. Softw Test Verif Reliab 22(3):171–201

    Article  Google Scholar 

  • Zampetti F, Scalabrino S, Oliveto R, Canfora G, Penta MD (2017) How open source projects use static code analysis tools in continuous integration pipelines. In: 2017 IEEE/ACM 14Th international conference on mining software repositories (MSR), pp 334–344

  • Zhang P, Elbaum S (2012) Amplifying tests to validate exception handling code. In: Proc. of int. Conf. on software engineering (ICSE). IEEE Press, pp 595–605

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin Danglot.

Additional information

Communicated by: Tao Yue

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Danglot, B., Monperrus, M., Rudametkin, W. et al. An approach and benchmark to detect behavioral changes of commits in continuous integration. Empir Software Eng 25, 2379–2415 (2020). https://doi.org/10.1007/s10664-019-09794-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-019-09794-7

Keywords

Navigation