An approach and benchmark to detect behavioral changes of commits in continuous integration

Danglot, Benjamin; Monperrus, Martin; Rudametkin, Walter; Baudry, Benoit

doi:10.1007/s10664-019-09794-7

An approach and benchmark to detect behavioral changes of commits in continuous integration

Published: 05 March 2020

Volume 25, pages 2379–2415, (2020)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Benjamin Danglot ORCID: orcid.org/0000-0001-9483-5743¹,
Martin Monperrus²,
Walter Rudametkin³ &
…
Benoit Baudry²

673 Accesses
9 Citations
8 Altmetric
Explore all metrics

Abstract

When a developer pushes a change to an application’s codebase, a good practice is to have a test case specifying this behavioral change. Thanks to continuous integration (CI), the test is run on subsequent commits to check that they do no introduce a regression for that behavior. In this paper, we propose an approach that detects behavioral changes in commits. As input, it takes a program, its test suite, and a commit. Its output is a set of test methods that capture the behavioral difference between the pre-commit and post-commit versions of the program. We call our approach DCI (Detecting behavioral changes in CI). It works by generating variations of the existing test cases through (i) assertion amplification and (ii) a search-based exploration of the input space. We evaluate our approach on a curated set of 60 commits from 6 open source Java projects. To our knowledge, this is the first ever curated dataset of real-world behavioral changes. Our evaluation shows that DCI is able to generate test methods that detect behavioral changes. Our approach is fully automated and can be integrated into current development processes. The main limitations are that it targets unit tests and works on a relatively small fraction of commits. More specifically, DCI works on commits that have a unit test that already executes the modified code. In practice, from our benchmark projects, we found 15.29% of commits to meet the conditions required by DCI.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Listing 2

Listing 10

Automatic software refactoring: a systematic literature review

Article 03 December 2019

An empirical study of automated unit test generation for Python

Article Open access 31 January 2023

Empirical Research in Software Engineering — A Literature Survey

Article 12 September 2018

Notes

https://github.com/STAMP-project/dspot-experiments
https://github.com/xwiki/xwiki-commons/commit/7e79f77
by default, nb = 3
https://github.com/STAMP-project/dspot.git
We are aware that behavioral changes can be introduced in other ways, such as modifying dependencies or configuration files (Hilton et al. 2018).
https://github.com/STAMP-project/dspot-experiments/tree/master/src/main/python/april-2019
https://github.com/apache/commons-lang/commit/3fadfdd
https://github.com/apache/commons-io/commit/c6b8a38
https://github.com/apache/commons-lang/commit/f56931c
https://github.com/google/gson/commit/9e6f2ba
https://github.com/jhy/jsoup/commit/e9feec9
https://github.com/spullara/mustache.java/commit/88718bc
https://github.com/xwiki/xwiki-commons/commit/848c984
For a side-by-side comparison, see https://danglotb.github.io/resources/dci/index.html
https://github.com/apache/commons-io/commit/81210eb
https://github.com/apache/commons-lang/commit/e7d16c2
https://github.com/google/gson/commit/44cad04
Interestingly, the number is parsed lazily, only when needed. Consequently, the exception is thrown when invoking the longValue() method and not when invoking parse()
https://github.com/jhy/jsoup/commit/3676b13
https://github.com/spullara/mustache.java/commit/774ae7a
https://github.com/xwiki/xwiki-commons/commit/d3101ae

References

Anand S, Pasareanu CS, Visser W (2007) Jpf-se: A symbolic execution extension to java pathfinder 03
Böhme M, Roychoudhury A (2014) Corebench: Studying complexity of regression errors. In: Proceedings of the 2014 International Symposium on Software Testing and Analysis. ACM, pp 105–115
Cadar C, Dunbar D, Engler D (2008) Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08. USENIX Association, Berkeley, pp 209–224
Campos J, Arcuri A, Fraser G, Abreu R (2014) Continuous test generation: Enhancing continuous integration with automated test generation. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ASE ’14. ACM, pp 55–66
Danglot B, Vera-Pérez OL, Baudry B, Monperrus M (2019) Automatic test improvement with dspot: a study with ten mature open-source projects. Empirical Software Engineering
Daniel B, Jagannath V, Dig D, Marinov D (2009) Reassert: Suggesting repairs for broken unit tests. In: 2009 IEEE/ACM International conference on automated software engineering, pp 433–444
Duvall PM, Matyas S, Glover A (2007) Continuous integration: improving software quality and reducing risk. Pearson Education
Evans RB, Savoia A (2007) Differential testing: a new approach to change detection. In: The 6th joint meeting on european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering: Companion papers. ACM, pp 549–552
Falleri J-R, Morandat F, Blanc X, Martinez M, Monperrus M (2014) Fine-grained and Accurate Source Code Differencing. In: Proceedings of the International Conference on Automated Software Engineering, pp 313–324
Fowler M, Foemmel M (2006) Continuous integration. Thought-Works https://www.thoughtworks.com/continuous-integration, pp 122:14
Fraser G, Arcuri A (2012) The seed is strong: Seeding strategies in search-based software testing. In: 2012 IEEE fifth international conference on Software testing, verification and validation (ICST). IEEE, pp 121–130
Godefroid P, Klarlund N, Sen K (2005) Dart: directed automated random testing. In: ACM Sigplan notices. ACM, vol 40, pp 213–223
Groce A, Holzmann G, Joshi R (2007) Randomized differential testing as a prelude to formal verification. In: Proceedings of the 29th international conference on Software Engineering. IEEE Computer Society, pp 621–631
Hilton M, Tunnell T, Huang K, Marinov D, Dig D (2016) Usage, costs, and benefits of continuous integration in open-source projects. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016 .ACM, New York, pp 426–437
Hilton M, Bell J, Marinov D (2018) A large-scale study of test coverage evolution. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018. ACM, New York, pp 53–63
Jin W, Orso A, Xie T (2010) Automated behavioral regression testing. In: 2010 Third international conference on software testing, verification and validation, pp 137–146
Kuchta T, Palikareva H, Cadar C (2018) Shadow symbolic execution for testing software patches. ACM Trans Softw Eng Methodol 27(3):10:1–10:32
Article Google Scholar
Lahiri S, McMillan K, Hawblitzel C (2013) Differential assertion checking. Technical report
Madeiral F, Urli S, Maia M, Monperrus M (2019) Bears An Extensible Java Bug Benchmark for Automatic Program Repair Studies. In: Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’19)
Marinescu PD, Cadar C (2013) KATCH: high-coverage testing of software patches. ACM Press, pp 235
Menarini M, Yan Y, Griswold WG (2017) Semantics-assisted code review: an efficient tool chain and a user study. In: 2017 32Nd IEEE/ACM international conference on automated software engineering (ASE), pp 554–565
Noller Y, Nguyen HL, Tang M, Kehrer T (2018) Shadow symbolic execution with java pathfinder. SIGSOFT Softw. Eng. Notes 42(4):1–5
Article Google Scholar
Palikareva H, Kuchta T, Cadar C (2016) Shadow of a doubt: testing for divergences between software versions. In: Proceedings of the 38th International Conference on Software Engineering. ACM, pp 1181–1192
Person S, Dwyer MB, Elbaum S, Pǎsǎreanu CS (2008) Differential symbolic execution. In: sProceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, SIGSOFT ’08/FSE-16. ACM, New York, pp 226–237, NY
Saff D, Ernst MD (2004) An experimental evaluation of continuous testing during development. In: ACM SIGSOFT Software engineering notes. ACM, vol 29, pp 76–85
Spieker H, Gotlieb A, Marijan D, Mossige M (2017) Reinforcement learning for automatic test case prioritization and selection in continuous integration. In: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2017. ACM, New York, pp 12–22
Taneja K, Xie T (2008) Diffgen: Automated regression unit-test generation. In: Proceedings of the 2008 23rd IEEE/ACM International Conference on Automated Software Engineering, ASE ’08. IEEE Computer Society, Washington, pp 407–410
Tonella P (2004) Evolutionary testing of classes. In: Proceedings of the 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA ’04. ACM, New York, pp 119–128
Urli S, Yu Z, Seinturier L, Monperrus M (2018) How to Design a Program Repair Bot? Insights from the Repairnator Project. In: ICSE 2018 - 40Th international conference on software engineering, track software engineering in practice (SEIP), pp 1–10
Vera-Pérez OL, Danglot B, Monperrus M, Baudry B (2018) A comprehensive study of pseudo-tested methods. Empirical Software Engineering
Voas JM, Miller KW (1995) Software testability: the new verification. IEEE Softw 12(3):17–28
Article Google Scholar
Waller J, Ehmke NC, Hasselbring W (2015) Including performance benchmarks into continuous integration to enable devops. SIGSOFT Softw Eng Notes 40(2):1–4
Article Google Scholar
Xie T (2006) Augmenting automatically generated unit-test suites with regression oracle checking. In: Thomas D (ed) ECOOP 2006 – Object-Oriented Programming. Springer, Berlin, pp 380–403
Yang G, Khurshid S, Person S, Rungta N (2014) Property differencing for incremental checking. In: Proceedings of the 36th International Conference on Software Engineering, ICSE 2014. ACM, New York, pp 1059–1070
Yoo S, Harman M (2012) Test data regeneration: Generating new test data from existing test data. Softw Test Verif Reliab 22(3):171–201
Article Google Scholar
Zampetti F, Scalabrino S, Oliveto R, Canfora G, Penta MD (2017) How open source projects use static code analysis tools in continuous integration pipelines. In: 2017 IEEE/ACM 14Th international conference on mining software repositories (MSR), pp 334–344
Zhang P, Elbaum S (2012) Amplifying tests to validate exception handling code. In: Proc. of int. Conf. on software engineering (ICSE). IEEE Press, pp 595–605

Download references

Author information

Authors and Affiliations

INRIA, Lille-Nord Europe, 40 Avenue Halley, Villeneuve d’Ascq, 59650, France
Benjamin Danglot
KTH Royal Institute of Technology, Brinellvägen 8, 114 28, Stockholm, Sweden
Martin Monperrus & Benoit Baudry
Université de Lille, 42 rue Paul Duez, 59000, Lille, France
Walter Rudametkin

Authors

Benjamin Danglot
View author publications
You can also search for this author in PubMed Google Scholar
Martin Monperrus
View author publications
You can also search for this author in PubMed Google Scholar
Walter Rudametkin
View author publications
You can also search for this author in PubMed Google Scholar
Benoit Baudry
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin Danglot.

Additional information

Communicated by: Tao Yue

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Danglot, B., Monperrus, M., Rudametkin, W. et al. An approach and benchmark to detect behavioral changes of commits in continuous integration. Empir Software Eng 25, 2379–2415 (2020). https://doi.org/10.1007/s10664-019-09794-7

Download citation

Published: 05 March 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s10664-019-09794-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An approach and benchmark to detect behavioral changes of commits in continuous integration

Abstract

Access this article

Similar content being viewed by others

Automatic software refactoring: a systematic literature review

An empirical study of automated unit test generation for Python

Empirical Research in Software Engineering — A Literature Survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An approach and benchmark to detect behavioral changes of commits in continuous integration

Abstract

Access this article

Similar content being viewed by others

Automatic software refactoring: a systematic literature review

An empirical study of automated unit test generation for Python

Empirical Research in Software Engineering — A Literature Survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation