Studying test-driven development and its retainment over a six-month time span☆
Introduction
Test-Driven Development (TDD) (Beck, 2003, Astels, 2003) is a cyclic development approach where unit tests drive the incremental development of small pieces of functionality (Erdogmus et al., 2010). Each development cycle starts with the writing of unit tests for an unimplemented piece of functionality. A cycle ends when unit tests pass as well as the existing regression test suite. An important role in the process underlying TDD is played by refactoring. It allows a TDD practitioner to improve the internal structure of the code, as well as its design, while preserving the external behavior of the code thanks to the safety net the existing regression test suite provides (Astels, 2003). The end of a cycle allows a TDD practitioner to tackle a new piece of functionality, not yet implemented, so starting a new development cycle (Beck, 2003, Astels, 2003). Advocates of TDD recommend ending a development cycle in few minutes (five or ten minutes Jeffries and Melnik, 2007) and keeping the rhythm as uniform as possible over time (Beck, 2003, Erdogmus et al., 2010). The order with which unit tests interpose within the process underlying TDD– i.e., the writing of a test precedes the one of the corresponding production code – is known as test-first sequencing (or test-first dynamic) (Fucci et al., 2017). It is worth noting that test-first sequencing refers to just one central aspect of TDD (Karac and Turhan, 2018). That is, it does not capture the full nature of TDD (Fucci et al., 2017). Other central aspects that characterize the development process underlying TDD are: granularity, uniformity, and refactoring effort (Fucci et al., 2017). Granularity refers to the duration of the development cycles, while uniformity reflects how constant their duration is over time (Fucci et al., 2017). Finally, refactoring effort captures how much refactoring a TDD practitioner performs.
It is claimed that TDD leads to higher-quality software products in terms of both external (i.e., functional) and internal quality, while increasing developers’ productivity (Beck, 2003). These claimed benefits have encouraged some software companies to adopt TDD, while others are considering its adoption (Tosun et al., 2017). TDD has been assessed from a quantitative point of view (e.g., Fucci et al. (2016), Erdogmus et al. (2005)) and according to a qualitative perspective (e.g., Romano et al. (2016), Scanniello et al. (2016)). A number of primary studies, like experiments or case studies, have been conducted on TDD (Fucci et al., 2016, Erdogmus et al., 2005, George and Williams, 2004, Bhat and Nagappan, 2006, Nagappan et al., 2008). Their results, gathered and combined in a number of secondary studies (Karac and Turhan, 2018, Bissi et al., 2016, Fucci et al., 2015, Turhan et al., 2010, Munir et al., 2014, Rafique and Mišić, 2013), do not fully support the claimed benefits of TDD (i.e., while some primary studies have shown that TDD allows improving quality of software products and/or developers’ productivity, other primary studies have not). Some researchers have conjectured that long-term observations are needed to see the claimed benefits of TDD and/or to better understand this development approach; therefore, they have recommended taking a longitudinal approach when investigating TDD (Fucci et al., 2015, Munir et al., 2014, Shull et al., 2010, Müller and Höfer, 2007)—i.e., studying TDD over a time span. Nevertheless, only Latorre (2014), Borle et al. (2017), Beller et al. (2017), and Marchenko et al. (2009) have taken a longitudinal approach.
Longitudinal studies2 employ continuous or repeated measures to follow particular individuals over a time span of weeks, months, or even years (Caruana et al., 2015). In this paper, we present a study on TDD that takes a longitudinal approach. In particular, we conducted a longitudinal cohort study in which our cohort consisted of 30 novice developers of homogeneous experience who attended the same training regarding agile software development, including TDD. The design of our study allowed us to have a term of comparison between TDD and a non-TDD approach, defined as the approach that developers would normally follow (e.g., iterative test-last, big-bang testing, or no testing at all—but not TDD), with respect to external quality, developers’ productivity, number of tests written, and fault-detection capability of tests written. Moreover, thanks to our cohort, we collected separate measurements of the same constructs (i.e., external quality, developers’ productivity, number of tests written, fault-detection capability of tests written, test-first sequencing, granularity, uniformity, and refactoring effort) (about) six months apart with the goal of understanding how well TDD can be applied over time, giving an indication of its retainment (or retention).3
While we did not find any improvement, due to TDD, in the external quality of software products and developers’ productivity, we observed that TDD allows creating larger test suites with a higher fault-detection capability. Moreover, our results indicate that novice developers retain TDD for at least six months (i.e., the time span from when novice developers learned and applied TDD for the first time to when they applied TDD again).
This paper extends the one by Fucci et al. (2018) as follows:
- •
Since Fucci et al. had found that TDD leads developers to write more tests, we studied whether writing more tests implies that the fault-detection capability of those tests is actually better. This was to strengthen the conclusions from Fucci et al.’s study. It is worth mentioning that we studied both effect and retainment of TDD with respect to fault-detection capability of written tests.
- •
We investigated the retainment of TDD with respect to four aspects that characterize the process underlying TDD: test-first sequencing, granularity, uniformity, and refactoring effort.
- •
We extended the inferential statistics by applying a second statistical model. This allowed us to mitigate, as much as possible, threats to the conclusion validity of the results shown in Fucci et al.’s paper.
Paper structure. In Section 2, we outline work related to ours. We present our study in Section 3. The obtained results are presented and discussed in Section 4 and Section 5, respectively. Final remarks conclude the paper.
Section snippets
Related work
The effect of TDD on several outcomes – including functional quality and productivity, which are of interest for this study – has been the topic of several empirical studies, summarized in Systematic Literature Reviews (SLRs) and meta-analyses (Bissi et al., 2016, Turhan et al., 2010, Munir et al., 2014, Rafique and Mišić, 2013). The SLR by Turhan et al. (2010) includes 32 primary studies (e.g., controlled experiments and case studies) published from 2000 to 2009. The gathered evidence shows a
Empirical study
The goal of a longitudinal study is to investigate “how certain conditions change over time” (Yin, 2009). Therefore, the data collection happens over a time span and can require the researchers to be co-located with the case and context in which the phenomenon of interest takes place. In the context of software engineering, longitudinal studies are often associated with the case study methodology. In other cases, longitudinal studies are employed to observe the impact of a potentially
Results
In the following of this section, we first present the results from the descriptive statistics and exploratory analyses and then we provide results from the inferential statistics.
Discussion
In this section, we first answer the RQs to delineate the main findings of our cohort study. We then discuss these findings and present their practical implications. Finally, we discuss the threats that might have affected the validity of these findings.
Conclusion
In this paper, we present the results from a (quantitative) longitudinal cohort study with 30 novice developers to investigate the effect of TDD, as compared to a non-TDD approach, as well as the retainment of TDD over a time span of (about) six months.
As for the comparison of TDD with a non-TDD approach, we show that TDD has no effect on external quality of software products and developers’ productivity. However, we observed that the participants practicing TDD wrote significantly more unit
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
We would like the thank the students for their participation in our study.
Maria Teresa Baldassarre received her Laurea degree with honors in informatics at the University of Bari, Italy, where she has also received her PhD. She is currently assistant professor. Her research interests focus on: empirical software engineering, software quality assurance and human factors in software engineering. She is responsible for several international research collaborations. Partner of the SER&Practices spin-off company. Currently, she is representative of the University of Bari
References (51)
- et al.
Besouro: A framework for exploring compliance rules in automatic tdd behavior assessment
Inf. Softw. Technol.
(2015) - et al.
The effects of test driven development on internal quality, external quality and productivity: A systematic review
Inf. Softw. Technol.
(2016) - et al.
Towards an operationalization of test-driven development skills: An industrial empirical study
Inf. Softw. Technol.
(2015) - et al.
A structured experiment of test-driven development
Inf. Softw. Technol.
(2004) - et al.
Considering rigor and relevance when evaluating test driven development: A systematic review
Inf. Softw. Technol.
(2014) - et al.
Findings from a multi-method study on test-driven development
Inf. Softw. Technol.
(2017) Test Driven Development: A Practical Guide
(2003)- et al.
Building knowledge through families of experiments
IEEE Trans. Softw. Eng.
(1999) Test-Driven Development: By Example
(2003)- et al.
Developer testing in the ide: Patterns, beliefs, and behavior
IEEE Trans. Softw. Eng.
(2017)
Evaluating the efficacy of test-driven development: Industrial case studies
Analyzing the effects of test driven development in github
Empir. Softw. Eng.
Longitudinal studies
J. Thorac. Dis.
Issues in using students in empirical studies in software engineering education
Factors limiting industrial adoption of test driven development: A systematic review
Experimental and Quasi-Experimental Designs for Generalized Causal Inference
Empirical evaluation of the effects of experience on code quality and programmer productivity: an exploratory study
Empir. Softw. Eng.
Test-driven development
On the effectiveness of the test-first approach to programming
IEEE Trans. Softw. Eng.
A dissection of the test-driven development process: Does it really matter to test-first or to test-last?
IEEE Trans. Softw. Eng.
A longitudinal cohort study on the retainment of test-driven development
An external replication on the effects of test-driven development using a multi-site blind analysis approach
A replicated experiment on the effectiveness of test-first development
Using students as subjects—A comparative study of students and professionals in lead-time impact assessment
Empir. Softw. Eng.
Reporting experiments in software engineering
Cited by (7)
Affective reactions and test-driven development: Results from three experiments and a survey
2022, Journal of Systems and SoftwareCitation Excerpt :The above-mentioned finding (i.e., the higher the experience with unit testing, practiced in a test-last manner, the more negative the affective reactions caused by TDD) can be of interest to CS educators who teach unit testing. For example, they should start teaching TDD as soon as possible to lessen the negative affective reactions that TDD might cause since there is empirical evidence showing that, with time, TDD leads developers to write more unit tests with a higher fault-detection capability (Fucci et al., 2018; Baldassarre et al., 2021). As far as researchers is concerned, they could be interested in studying whether the experience with unit testing plays a relevant role in the relationship between TDD and the affective reactions of developers (as far as APPLIK, IMPLIK, and TESPLS is concerned) through experiments specifically designed for such a purpose.
Test-Driven Development and Embedded Systems: An Exploratory Investigation
2023, Proceedings - 2023 49th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2023Agile Framework Adaptation Issues in Various Sectors
2023, Agile Software Development: Trends, Challenges and ApplicationsA Two-stage Method of Synchronization Prediction Framework in TDD
2022, Arabian Journal for Science and EngineeringOn the Role of Personality Traits in Implementation Tasks: A Preliminary Investigation with Students
2022, Proceedings - 48th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2022
Maria Teresa Baldassarre received her Laurea degree with honors in informatics at the University of Bari, Italy, where she has also received her PhD. She is currently assistant professor. Her research interests focus on: empirical software engineering, software quality assurance and human factors in software engineering. She is responsible for several international research collaborations. Partner of the SER&Practices spin-off company. Currently, she is representative of the University of Bari in the International Software Engineering Research Network (ISERN). She is also involved in various program committees related to software engineering and empirical software engineering.
Danilo Caivano is currently an Associate Professor of software engineering and project management with the Department of Computer Science, University of Bari Aldo Moro, and a Consultant for companies and organizations especially in the field of research and development projects. He is also the Head of the SERLAB Research Laboratory and the Director of the short master in cyber security. He contributed to the creation of The Hack Space, Cyber Security Laboratory, University of Bari. He is also a member of the Board of Director of the Southern Italy Chapter Project Management Institute, the Co-Ordinator of the PMISIC Academy, and a member of the Technical Scientific Committee of the Apulian Information Technology District, and the IT Strategic Steering Committee.
Davide Fucci is an Assistant Professor at Blekinge Institute of Technology (Sweden). He received his Ph.D. from the University of Oulu (Finland) in 2016. He has a strong background on empirical studies in software engineering, publishing and serving as committee member and reviewer for several venues in the field (e.g., ESEM, EMSE, IEEE TSE). Currently, his research interests lie in data-drive requirements engineering, test automation, and human aspects of software development. He started the AffectRE workshop series on emotional awareness in requirements engineering at RE and co-organizer of SEmotion’19 workshop at ICSE. He is involved with the Software Engineering ReThought project at BTH and with the H2020 OpenReq project at the University of Hamburg. More on: orcid.org/0000-0002-0679-4361.
Natalia Juristo (grise.upm.es/miembros/natalia) is full professor of software engineering with the Computing School at the Technical University of Madrid (UPM) since 1997. Natalia held a FiDiPro (Finland Distinguish Professor) research grant at University of Oulu from January 2013 to June 2018. She was the Director of the UPM M.Sc. in Software Engineering from 1992 to 2002 and the coordinator of the Erasmus Mundus European Master on SE (with the participation of the University of Bolzano, the University of Kaiserslautern and the University of Blekinge) from 2007 to 2012. Natalia will be General Chair for ICSE 2021 to be held in Madrid. She has served in several Program Committees ICSE, RE, REFSQ, ESEM, ISESE, and others. She has been Program Chair for EASE 2013, ISESE 2004 and SEKE 1997 and General Chair for ESEM 2007, SNPD 2002, and SEKE 2001. She has been member of several editorial boards, including TSE (Jan 2013–Dec 2017), EMSE (since 2002) and Software magazine (1997 to 2001) among others. Natalia has been guest editor of special issues in several journals, including EMSE, IEEE Software, JSS, DKE, and Int J Softw Eng Knowl Eng. Natalia has been ranked number 10 (among the experienced SE researchers) in a paper published in January 2019 at JSS that evaluates the 2010–2017 period.
Simone Romano received his master’s degree in Computer Engineering from the University of Basilicata, Italy, in 2014 and then the Ph.D. in Computer Science from the University of Salento, Italy (in collaboration with the University of Basilicata) in 2018. He then joined the Department of Informatics at the University of Bari, where he is currently a postdoctoral research fellow. He has served in the organization and has been a program committee member of different conferences such as ESEM, ICPC, SEAA, and PROFES. His research interests include: software refactoring, software testing, test-driven development, empirical software engineering, and human factors in software engineering.
Giuseppe Scanniello: received his Laurea and Ph.D. degrees, both in Computer Science, from the University of Salerno, Italy, in 2001 and 2003, respectively. In 2006, he joined, as an Assistant Professor, the Department of Mathematics and Computer Science at the University of Basilicata, Potenza, Italy. In 2015, he became an Associate Professor at the same university. His research interests include requirements engineering, empirical software engineering, reverse engineering, reengineering, software visualization, workflow automation, migration, wrapping, integration, testing, green software engineering, global software engineering, cooperative supports for software engineering, visual languages and e-learning. He has published more than 160 referred papers in journals, books, and conference proceedings. He serves on the organizing of major international conferences (as general chair, program co-chair, proceedings chair, and member of the program committee) and workshops in the field of software engineering (e.g., ICSE, ASE, ICSME, ICPC, SANER, and many others). Giuseppe Scanniello leads both the group and the laboratory of software engineering at the University of Basilicata (BASELab). He recently obtained the Italian National Scientific Qualification as Full Professor in Computer Science. He is a member of IEEE and IEEE Computer Society. More on: sites.google.com/view/prof-giuseppe-scanniello/home.
Burak Turhan is an Associate Professor in Cyber Security & Software Systems at Monash University. His research focuses on empirical software engineering, software analytics, quality assurance and testing, human factors, (agile) development processes, and digital health. Dr. Turhan has published over 100 articles in international journals and conferences, received several best paper awards, and secured several large-scale external research grants. He has served on the program committees of over 30 academic conferences, on the editorial or review boards of several top-tier software engineering journals, and as (co-)chair for PROMISE’13, ESEM’17, and PROFES’17 conferences. He is a member of ACM, ACM SIGSOFT, IEEE and IEEE Computer Society. For more information please visit: turhanb.net.
- ☆
Editor: Sarah Beecham.
- 1
The authors have equally contributed to the research presented in the paper.