Studying test-driven development and its retainment over a six-month time span

doi:10.1016/j.jss.2021.110937

Journal of Systems and Software

Volume 176, June 2021, 110937

https://doi.org/10.1016/j.jss.2021.110937 Get rights and content

Highlights

•
TDD is retained by developers at least for a six-month time span.
•
TDD does not affect the external quality of software products.
•
TDD does not affect developers’ productivity.
•
Developers applying TDD produce significantly more tests.
•
Tests written when applying TDD have a higher fault-detection capability.

Abstract

In this paper, we investigate the effect of TDD, as compared to a non-TDD approach, as well as its retainment (or retention) over a time span of (about) six months. To pursue these objectives, we conducted a (quantitative) longitudinal cohort study with 30 novice developers (i.e., third-year undergraduate students in Computer Science). We observed that TDD affects neither the external quality of software products nor developers’ productivity. However, we observed that the participants applying TDD produced significantly more tests, with a higher fault-detection capability, than those using a non-TDD approach. As for the retainment of TDD, we found that TDD is retained by novice developers for at least six months.

Introduction

Test-Driven Development (TDD) (Beck, 2003, Astels, 2003) is a cyclic development approach where unit tests drive the incremental development of small pieces of functionality (Erdogmus et al., 2010). Each development cycle starts with the writing of unit tests for an unimplemented piece of functionality. A cycle ends when unit tests pass as well as the existing regression test suite. An important role in the process underlying TDD is played by refactoring. It allows a TDD practitioner to improve the internal structure of the code, as well as its design, while preserving the external behavior of the code thanks to the safety net the existing regression test suite provides (Astels, 2003). The end of a cycle allows a TDD practitioner to tackle a new piece of functionality, not yet implemented, so starting a new development cycle (Beck, 2003, Astels, 2003). Advocates of TDD recommend ending a development cycle in few minutes (five or ten minutes Jeffries and Melnik, 2007) and keeping the rhythm as uniform as possible over time (Beck, 2003, Erdogmus et al., 2010). The order with which unit tests interpose within the process underlying TDD– i.e., the writing of a test precedes the one of the corresponding production code – is known as test-first sequencing (or test-first dynamic) (Fucci et al., 2017). It is worth noting that test-first sequencing refers to just one central aspect of TDD (Karac and Turhan, 2018). That is, it does not capture the full nature of TDD (Fucci et al., 2017). Other central aspects that characterize the development process underlying TDD are: granularity, uniformity, and refactoring effort (Fucci et al., 2017). Granularity refers to the duration of the development cycles, while uniformity reflects how constant their duration is over time (Fucci et al., 2017). Finally, refactoring effort captures how much refactoring a TDD practitioner performs.

It is claimed that TDD leads to higher-quality software products in terms of both external (i.e., functional) and internal quality, while increasing developers’ productivity (Beck, 2003). These claimed benefits have encouraged some software companies to adopt TDD, while others are considering its adoption (Tosun et al., 2017). TDD has been assessed from a quantitative point of view (e.g., Fucci et al. (2016), Erdogmus et al. (2005)) and according to a qualitative perspective (e.g., Romano et al. (2016), Scanniello et al. (2016)). A number of primary studies, like experiments or case studies, have been conducted on TDD (Fucci et al., 2016, Erdogmus et al., 2005, George and Williams, 2004, Bhat and Nagappan, 2006, Nagappan et al., 2008). Their results, gathered and combined in a number of secondary studies (Karac and Turhan, 2018, Bissi et al., 2016, Fucci et al., 2015, Turhan et al., 2010, Munir et al., 2014, Rafique and Mišić, 2013), do not fully support the claimed benefits of TDD (i.e., while some primary studies have shown that TDD allows improving quality of software products and/or developers’ productivity, other primary studies have not). Some researchers have conjectured that long-term observations are needed to see the claimed benefits of TDD and/or to better understand this development approach; therefore, they have recommended taking a longitudinal approach when investigating TDD (Fucci et al., 2015, Munir et al., 2014, Shull et al., 2010, Müller and Höfer, 2007)—i.e., studying TDD over a time span. Nevertheless, only Latorre (2014), Borle et al. (2017), Beller et al. (2017), and Marchenko et al. (2009) have taken a longitudinal approach.

Longitudinal studies² employ continuous or repeated measures to follow particular individuals over a time span of weeks, months, or even years (Caruana et al., 2015). In this paper, we present a study on TDD that takes a longitudinal approach. In particular, we conducted a longitudinal cohort study in which our cohort consisted of 30 novice developers of homogeneous experience who attended the same training regarding agile software development, including TDD. The design of our study allowed us to have a term of comparison between TDD and a non-TDD approach, defined as the approach that developers would normally follow (e.g., iterative test-last, big-bang testing, or no testing at all—but not TDD), with respect to external quality, developers’ productivity, number of tests written, and fault-detection capability of tests written. Moreover, thanks to our cohort, we collected separate measurements of the same constructs (i.e., external quality, developers’ productivity, number of tests written, fault-detection capability of tests written, test-first sequencing, granularity, uniformity, and refactoring effort) (about) six months apart with the goal of understanding how well TDD can be applied over time, giving an indication of its retainment (or retention).³

While we did not find any improvement, due to TDD, in the external quality of software products and developers’ productivity, we observed that TDD allows creating larger test suites with a higher fault-detection capability. Moreover, our results indicate that novice developers retain TDD for at least six months (i.e., the time span from when novice developers learned and applied TDD for the first time to when they applied TDD again).

This paper extends the one by Fucci et al. (2018) as follows:

•
Since Fucci et al. had found that TDD leads developers to write more tests, we studied whether writing more tests implies that the fault-detection capability of those tests is actually better. This was to strengthen the conclusions from Fucci et al.’s study. It is worth mentioning that we studied both effect and retainment of TDD with respect to fault-detection capability of written tests.
•
We investigated the retainment of TDD with respect to four aspects that characterize the process underlying TDD: test-first sequencing, granularity, uniformity, and refactoring effort.
•
We extended the inferential statistics by applying a second statistical model. This allowed us to mitigate, as much as possible, threats to the conclusion validity of the results shown in Fucci et al.’s paper.

Paper structure. In Section 2, we outline work related to ours. We present our study in Section 3. The obtained results are presented and discussed in Section 4 and Section 5, respectively. Final remarks conclude the paper.

Section snippets

Related work

The effect of TDD on several outcomes – including functional quality and productivity, which are of interest for this study – has been the topic of several empirical studies, summarized in Systematic Literature Reviews (SLRs) and meta-analyses (Bissi et al., 2016, Turhan et al., 2010, Munir et al., 2014, Rafique and Mišić, 2013). The SLR by Turhan et al. (2010) includes 32 primary studies (e.g., controlled experiments and case studies) published from 2000 to 2009. The gathered evidence shows a

Empirical study

The goal of a longitudinal study is to investigate “how certain conditions change over time” (Yin, 2009). Therefore, the data collection happens over a time span and can require the researchers to be co-located with the case and context in which the phenomenon of interest takes place. In the context of software engineering, longitudinal studies are often associated with the case study methodology. In other cases, longitudinal studies are employed to observe the impact of a potentially

Results

In the following of this section, we first present the results from the descriptive statistics and exploratory analyses and then we provide results from the inferential statistics.

Discussion

In this section, we first answer the RQs to delineate the main findings of our cohort study. We then discuss these findings and present their practical implications. Finally, we discuss the threats that might have affected the validity of these findings.

Conclusion

In this paper, we present the results from a (quantitative) longitudinal cohort study with 30 novice developers to investigate the effect of TDD, as compared to a non-TDD approach, as well as the retainment of TDD over a time span of (about) six months.

As for the comparison of TDD with a non-TDD approach, we show that TDD has no effect on external quality of software products and developers’ productivity. However, we observed that the participants practicing TDD wrote significantly more unit

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We would like the thank the students for their participation in our study.

Maria Teresa Baldassarre received her Laurea degree with honors in informatics at the University of Bari, Italy, where she has also received her PhD. She is currently assistant professor. Her research interests focus on: empirical software engineering, software quality assurance and human factors in software engineering. She is responsible for several international research collaborations. Partner of the SER&Practices spin-off company. Currently, she is representative of the University of Bari

References (51)

BeckerK. et al.
Besouro: A framework for exploring compliance rules in automatic tdd behavior assessment
Inf. Softw. Technol.
(2015)
BissiW. et al.
The effects of test driven development on internal quality, external quality and productivity: A systematic review
Inf. Softw. Technol.
(2016)
FucciD. et al.
Towards an operationalization of test-driven development skills: An industrial empirical study
Inf. Softw. Technol.
(2015)
GeorgeB. et al.
A structured experiment of test-driven development
Inf. Softw. Technol.
(2004)
MunirH. et al.
Considering rigor and relevance when evaluating test driven development: A systematic review
Inf. Softw. Technol.
(2014)
RomanoS. et al.
Findings from a multi-method study on test-driven development
Inf. Softw. Technol.
(2017)
AstelsD.
Test Driven Development: A Practical Guide
(2003)
BasiliV. et al.
Building knowledge through families of experiments
IEEE Trans. Softw. Eng.
(1999)
BeckK.
Test-Driven Development: By Example
(2003)
BellerM. et al.
Developer testing in the ide: Patterns, beliefs, and behavior
IEEE Trans. Softw. Eng.
(2017)

BhatT. et al.

Evaluating the efficacy of test-driven development: Industrial case studies

BorleN. et al.

Analyzing the effects of test driven development in github

Empir. Softw. Eng.

(2017)

CaruanaE.J. et al.

Longitudinal studies

J. Thorac. Dis.

(2015)

CarverJ. et al.

Issues in using students in empirical studies in software engineering education

CausevicA. et al.

Factors limiting industrial adoption of test driven development: A systematic review

CookT.D. et al.

Experimental and Quasi-Experimental Designs for Generalized Causal Inference

(2002)

DiesteO. et al.

Empirical evaluation of the effects of experience on code quality and programmer productivity: an exploratory study

Empir. Softw. Eng.

(2017)

ErdogmusH. et al.

Test-driven development

ErdogmusH. et al.

On the effectiveness of the test-first approach to programming

IEEE Trans. Softw. Eng.

(2005)

FucciD. et al.

A dissection of the test-driven development process: Does it really matter to test-first or to test-last?

IEEE Trans. Softw. Eng.

(2017)

FucciD. et al.

A longitudinal cohort study on the retainment of test-driven development

FucciD. et al.

An external replication on the effects of test-driven development using a multi-site blind analysis approach

FucciD. et al.

A replicated experiment on the effectiveness of test-first development

HöstM. et al.

Using students as subjects—A comparative study of students and professionals in lead-time impact assessment

Empir. Softw. Eng.

(2000)

JedlitschkaA. et al.

Reporting experiments in software engineering

Cited by (7)

Affective reactions and test-driven development: Results from three experiments and a survey
2022, Journal of Systems and Software
Citation Excerpt :
The above-mentioned finding (i.e., the higher the experience with unit testing, practiced in a test-last manner, the more negative the affective reactions caused by TDD) can be of interest to CS educators who teach unit testing. For example, they should start teaching TDD as soon as possible to lessen the negative affective reactions that TDD might cause since there is empirical evidence showing that, with time, TDD leads developers to write more unit tests with a higher fault-detection capability (Fucci et al., 2018; Baldassarre et al., 2021). As far as researchers is concerned, they could be interested in studying whether the experience with unit testing plays a relevant role in the relationship between TDD and the affective reactions of developers (as far as APPLIK, IMPLIK, and TESPLS is concerned) through experiments specifically designed for such a purpose.
The research on the claimed effects of Test-Driven Development (TDD) on software quality and developers’ productivity has shown inconclusive results. Some researchers have ascribed such results to the negative affective reactions that TDD would provoke when developers apply it. In this paper, we studied whether and in which phases TDD influences the affective states of developers, who are new to this development approach. To that end, we conducted a baseline experiment and two replications, and analyzed the data from these experiments both individually and jointly. Also, we performed methodological triangulation by means of an explanatory survey, whose respondents were experienced with TDD. The results of the baseline experiment suggested that developers like TDD significantly less, compared to a non-TDD approach. Also, developers who apply TDD like implementing production code significantly less than those who apply a non-TDD approach, while testing production code makes TDD developers significantly less happy. These results were not confirmed in the replicated experiments. We found that the moderator that better explains these differences across experiments is experience (in months) with unit testing, practiced in a test-last manner. The higher the experience with unit testing, the more negative the affective reactions caused by TDD. The results from the survey seem to confirm the role of this moderator.
Unveiling ChatGPT’s Usage in Open Source Projects: A Mining-based Study
2024, arXiv
Test-Driven Development and Embedded Systems: An Exploratory Investigation
2023, Proceedings - 2023 49th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2023
Agile Framework Adaptation Issues in Various Sectors
2023, Agile Software Development: Trends, Challenges and Applications
A Two-stage Method of Synchronization Prediction Framework in TDD
2022, Arabian Journal for Science and Engineering
On the Role of Personality Traits in Implementation Tasks: A Preliminary Investigation with Students
2022, Proceedings - 48th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2022

View all citing articles on Scopus

Danilo Caivano is currently an Associate Professor of software engineering and project management with the Department of Computer Science, University of Bari Aldo Moro, and a Consultant for companies and organizations especially in the field of research and development projects. He is also the Head of the SERLAB Research Laboratory and the Director of the short master in cyber security. He contributed to the creation of The Hack Space, Cyber Security Laboratory, University of Bari. He is also a member of the Board of Director of the Southern Italy Chapter Project Management Institute, the Co-Ordinator of the PMISIC Academy, and a member of the Technical Scientific Committee of the Apulian Information Technology District, and the IT Strategic Steering Committee.

Davide Fucci is an Assistant Professor at Blekinge Institute of Technology (Sweden). He received his Ph.D. from the University of Oulu (Finland) in 2016. He has a strong background on empirical studies in software engineering, publishing and serving as committee member and reviewer for several venues in the field (e.g., ESEM, EMSE, IEEE TSE). Currently, his research interests lie in data-drive requirements engineering, test automation, and human aspects of software development. He started the AffectRE workshop series on emotional awareness in requirements engineering at RE and co-organizer of SEmotion’19 workshop at ICSE. He is involved with the Software Engineering ReThought project at BTH and with the H2020 OpenReq project at the University of Hamburg. More on: orcid.org/0000-0002-0679-4361.

Natalia Juristo (grise.upm.es/miembros/natalia) is full professor of software engineering with the Computing School at the Technical University of Madrid (UPM) since 1997. Natalia held a FiDiPro (Finland Distinguish Professor) research grant at University of Oulu from January 2013 to June 2018. She was the Director of the UPM M.Sc. in Software Engineering from 1992 to 2002 and the coordinator of the Erasmus Mundus European Master on SE (with the participation of the University of Bolzano, the University of Kaiserslautern and the University of Blekinge) from 2007 to 2012. Natalia will be General Chair for ICSE 2021 to be held in Madrid. She has served in several Program Committees ICSE, RE, REFSQ, ESEM, ISESE, and others. She has been Program Chair for EASE 2013, ISESE 2004 and SEKE 1997 and General Chair for ESEM 2007, SNPD 2002, and SEKE 2001. She has been member of several editorial boards, including TSE (Jan 2013–Dec 2017), EMSE (since 2002) and Software magazine (1997 to 2001) among others. Natalia has been guest editor of special issues in several journals, including EMSE, IEEE Software, JSS, DKE, and Int J Softw Eng Knowl Eng. Natalia has been ranked number 10 (among the experienced SE researchers) in a paper published in January 2019 at JSS that evaluates the 2010–2017 period.

Simone Romano received his master’s degree in Computer Engineering from the University of Basilicata, Italy, in 2014 and then the Ph.D. in Computer Science from the University of Salento, Italy (in collaboration with the University of Basilicata) in 2018. He then joined the Department of Informatics at the University of Bari, where he is currently a postdoctoral research fellow. He has served in the organization and has been a program committee member of different conferences such as ESEM, ICPC, SEAA, and PROFES. His research interests include: software refactoring, software testing, test-driven development, empirical software engineering, and human factors in software engineering.

Giuseppe Scanniello: received his Laurea and Ph.D. degrees, both in Computer Science, from the University of Salerno, Italy, in 2001 and 2003, respectively. In 2006, he joined, as an Assistant Professor, the Department of Mathematics and Computer Science at the University of Basilicata, Potenza, Italy. In 2015, he became an Associate Professor at the same university. His research interests include requirements engineering, empirical software engineering, reverse engineering, reengineering, software visualization, workflow automation, migration, wrapping, integration, testing, green software engineering, global software engineering, cooperative supports for software engineering, visual languages and e-learning. He has published more than 160 referred papers in journals, books, and conference proceedings. He serves on the organizing of major international conferences (as general chair, program co-chair, proceedings chair, and member of the program committee) and workshops in the field of software engineering (e.g., ICSE, ASE, ICSME, ICPC, SANER, and many others). Giuseppe Scanniello leads both the group and the laboratory of software engineering at the University of Basilicata (BASELab). He recently obtained the Italian National Scientific Qualification as Full Professor in Computer Science. He is a member of IEEE and IEEE Computer Society. More on: sites.google.com/view/prof-giuseppe-scanniello/home.

Burak Turhan is an Associate Professor in Cyber Security & Software Systems at Monash University. His research focuses on empirical software engineering, software analytics, quality assurance and testing, human factors, (agile) development processes, and digital health. Dr. Turhan has published over 100 articles in international journals and conferences, received several best paper awards, and secured several large-scale external research grants. He has served on the program committees of over 30 academic conferences, on the editorial or review boards of several top-tier software engineering journals, and as (co-)chair for PROMISE’13, ESEM’17, and PROFES’17 conferences. He is a member of ACM, ACM SIGSOFT, IEEE and IEEE Computer Society. For more information please visit: turhanb.net.

^☆: Editor: Sarah Beecham.

¹: The authors have equally contributed to the research presented in the paper.

View full text

Studying test-driven development and its retainment over a six-month time span☆

Highlights

Abstract

Introduction

Section snippets

Related work

Empirical study

Results

Discussion

Conclusion

Declaration of Competing Interest

Acknowledgments

Inf. Softw. Technol.

Inf. Softw. Technol.

Inf. Softw. Technol.

Inf. Softw. Technol.

Inf. Softw. Technol.

Inf. Softw. Technol.

Test Driven Development: A Practical Guide

Building knowledge through families of experiments

IEEE Trans. Softw. Eng.

Test-Driven Development: By Example

Developer testing in the ide: Patterns, beliefs, and behavior

IEEE Trans. Softw. Eng.

Evaluating the efficacy of test-driven development: Industrial case studies

Analyzing the effects of test driven development in github

Empir. Softw. Eng.

Longitudinal studies

J. Thorac. Dis.

Issues in using students in empirical studies in software engineering education

Factors limiting industrial adoption of test driven development: A systematic review

Experimental and Quasi-Experimental Designs for Generalized Causal Inference

Empirical evaluation of the effects of experience on code quality and programmer productivity: an exploratory study

Empir. Softw. Eng.

Test-driven development

On the effectiveness of the test-first approach to programming

IEEE Trans. Softw. Eng.

A dissection of the test-driven development process: Does it really matter to test-first or to test-last?

IEEE Trans. Softw. Eng.

A longitudinal cohort study on the retainment of test-driven development

An external replication on the effects of test-driven development using a multi-site blind analysis approach

A replicated experiment on the effectiveness of test-first development

Using students as subjects—A comparative study of students and professionals in lead-time impact assessment

Empir. Softw. Eng.

Reporting experiments in software engineering