Evaluating the agreement among technical debt measurement tools: building an empirical benchmark of technical debt liabilities

Amanatidis, Theodoros; Mittas, Nikolaos; Moschou, Athanasia; Chatzigeorgiou, Alexander; Ampatzoglou, Apostolos; Angelis, Lefteris

doi:10.1007/s10664-020-09869-w

Evaluating the agreement among technical debt measurement tools: building an empirical benchmark of technical debt liabilities

Published: 26 August 2020

Volume 25, pages 4161–4204, (2020)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Theodoros Amanatidis¹,
Nikolaos Mittas ORCID: orcid.org/0000-0003-3061-7864²,
Athanasia Moschou¹,
Alexander Chatzigeorgiou¹,
Apostolos Ampatzoglou¹ &
…
Lefteris Angelis³

1056 Accesses
20 Citations
7 Altmetric
Explore all metrics

Abstract

Software teams are often asked to deliver new features within strict deadlines leading developers to deliberately or inadvertently serve “not quite right code” compromising software quality and maintainability. This non-ideal state of software is efficiently captured by the Technical Debt (TD) metaphor, which reflects the additional effort that has to be spent to maintain software. Although several tools are available for assessing TD, each tool essentially checks software against a particular ruleset. The use of different rulesets can often be beneficial as it leads to the identification of a wider set of problems; however, for the common usage scenario where developers or researchers rely on a single tool, diverse estimates of TD and the identification of different mitigation actions limits the credibility and applicability of the findings. The objective of this study is two-fold: First, we evaluate the degree of agreement among leading TD assessment tools. Second, we propose a framework to capture the diversity of the examined tools with the aim of identifying few “reference assessments” (or class/file profiles) representing characteristic cases of classes/files with respect to their level of TD. By extracting sets of classes/files exhibiting similarity to a selected profile (e.g., that of high TD levels in all employed tools) we establish a basis that can be used either for prioritization of maintenance activities or for training more sophisticated TD identification techniques. The proposed framework is illustrated through a case study on fifty (50) open source projects and two programming languages (Java and JavaScript) employing three leading TD tools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Exploring Technical Debt Tools: A Systematic Mapping Study

Chapter 4 Technical Debt Tracking: Current State of Practice: A Survey and Multiple Case Study in 15 Large Organizations

Understanding automated and human-based technical debt identification approaches-a two-phase study

Article Open access 08 June 2019

Rodrigo O. Spínola, Nico Zazworka, … Carolyn Seaman

Notes

https://www.omg.org/spec/ATDM/About-ATDM
https://www.castsoftware.com/
https://www.vector.com/int/en/products/products-a-z/software/squore/squore-software-analytics-for-project-monitoring/
https://www.sonarqube.org
The term ‘class’ refers to the unit of analysis for Java projects, while the term ‘file’ refers to the unit of analysis for JavaScript projects. Throughout the paper we primarily use the term ‘class’ for simplicity, but both units of analysis are considered, accordingly.
https://shiny.rstudio.com
https://www.r-project.org
http://stains.csd.auth.gr
https://se.uom.gr/index.php/projects/technical-debt-benchmarking
https://se.uom.gr
https://ieeexplore.ieee.org
https://dl.acm.org/
https://scitools.com/
https://www.kiuwan.com/
https://www.omg.org/spec/ATDM/About-ATDM
https://doi.org/10.5281/zenodo.3966202
The idea of archetypes was developed by psychologist C. Jung in his studies about drivers of human behavior. Pearson suggested the use of 12 archetypes among which the ‘Ruler’ denotes personalities whose goal is to create a prosperous, successful family or community, while for a ‘Rebel’ (also known as Outlaw) the motto is that rules are made to be broken. In our context, the ‘Ruler’ profile denotes a community of classes sharing the same assessment by all employed tools, while the ‘Rebel’ points to tools that in some sense break the rules and identify TD items in a different way than the rest.
tool: https://se.uom.gr/index.php/projects/technical-debt-benchmarking
https://doi.org/10.5281/zenodo.3966202
The Partner archetype refers to personalities whose goal is being in a relationship with people and surroundings. In analogy, the Partner profile in our case denotes cases where two of the three tools exhibit high agreement.
https://github.com/theoam/TDBenchmarker
https://doi.org/10.5281/zenodo.3951041
https://github.com/
https://github.com/mauricioaniche/ck
https://www.omg.org/spec/ATDM/About-ATDM

References

Aitchison J (1982) The statistical analysis of compositional data. J R Stat Soc Ser B Methodol 44:139–177
MathSciNet MATH Google Scholar
Alves NSR, Mendes TS, de Mendonça MG, Spínola RO, Shull F, Seaman C (2016) Identification and management of technical debt: a systematic mapping study. Inf Softw Technol 70:100–121. https://doi.org/10.1016/j.infsof.2015.10.008
Article Google Scholar
Alves TL, Ypma C, Visser J (2010) Deriving metric thresholds from benchmark data, in: 2010 IEEE international conference on software maintenance. Presented at the 2010 IEEE international conference on software maintenance, pp. 1–10. https://doi.org/10.1109/ICSM.2010.5609747
Arvedahl S (2018) Introducing Debtgrep, a tool for fighting technical debt in Base Station software, in: proceedings of the 2018 international conference on technical debt, TechDebt ‘18. ACM, New York, NY, USA, pp. 51–52. https://doi.org/10.1145/3194164.3194183
Baggen R, Correia JP, Schill K, Visser J (2012) Standardized code quality benchmarking for improving software maintainability. Softw Qual J 20:287–307. https://doi.org/10.1007/s11219-011-9144-9
Article Google Scholar
Baldassari, B., 2013. SQuORE: a new approach to software project assessment
Campbell GA, Papapetrou PP (2013) SonarQube in action, 1st edn. Manning Publications Co., Greenwich
Google Scholar
Canhasi E, Kononenko I (2014) Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization. Expert Syst Appl 41:535–543. https://doi.org/10.1016/j.eswa.2013.07.079
Article Google Scholar
Chan BHP, Mitchell DA, Cram LE (2003) Archetypal analysis of galaxy spectra. Mon Not R Astron Soc 338:790–795. https://doi.org/10.1046/j.1365-8711.2003.06099.x
Article Google Scholar
Chatzipetrou P, Angelis L, Rovegård P, Wohlin C (2010) Prioritization of issues and requirements by cumulative voting: a compositional data analysis framework, in: 2010 36th EUROMICRO conference on software engineering and advanced applications. Presented at the 2010 36th EUROMICRO conference on software engineering and advanced applications, pp. 361–370. https://doi.org/10.1109/SEAA.2010.35
Chatzipetrou P, Papatheocharous E, Angelis L, Andreou AS (2015) A multivariate statistical framework for the analysis of software effort phase distribution. Inf Softw Technol 59:149–169. https://doi.org/10.1016/j.infsof.2014.11.004
Article Google Scholar
Chopra K, Sachdeva M (2015) Evaluation of software metrics for software projects. Int. J. Comput. Technol. 14:5845–5853. https://doi.org/10.24297/ijct.v14i6.1915
Article Google Scholar
Cohen J (1968) Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychol Bull 70:213–220. https://doi.org/10.1037/h0026256
Article Google Scholar
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20:37–46. https://doi.org/10.1177/001316446002000104
Article Google Scholar
Conejero JM, Rodríguez-Echeverría R, Hernández J, Clemente PJ, Ortiz-Caraballo C, Jurado E, Sánchez-Figueroa F (2018) Early evaluation of technical debt impact on maintainability. J Syst Softw 142:92–114. https://doi.org/10.1016/j.jss.2018.04.035
Article Google Scholar
Cunningham W (1992) The WyCash portfolio management system, in: addendum to the proceedings on object-oriented programming systems, languages, and applications (addendum), OOPSLA ‘92. ACM, New York, pp 29–30. https://doi.org/10.1145/157709.157715
Book Google Scholar
Curtis B, Sappidi J, Szynkarski A (2012) Estimating the principal of an Application’s technical debt. IEEE Softw 29:34–42. https://doi.org/10.1109/MS.2012.156
Article Google Scholar
Cutler A, Breiman L (1994) Archetypal analysis. Technometrics 36:338–347. https://doi.org/10.1080/00401706.1994.10485840
Article MathSciNet MATH Google Scholar
DeMarco T (1986) Controlling software projects: management, measurement, and estimates, 1 edition. ed. Prentice Hall, Englewood Cliffs, N.J
Döhmen T, Bruntink M, Ceolin D, Visser J (2016) Towards a Benchmark for the Maintainability Evolution of Industrial Software Systems, in: 2016 Joint Conference of the International Workshop on Software Measurement and the International Conference on Software Process and Product Measurement (IWSM-MENSURA). Presented at the 2016 Joint Conference of the International Workshop on Software Measurement and the International Conference on Software Process and Product Measurement (IWSM-MENSURA), pp. 11–21. https://doi.org/10.1109/IWSM-Mensura.2016.014
Tobias E, Pasquale LR, Shen LQ, Chen TC, Wiggs JL, Bex PJ (2015) Patterns of functional vision loss in glaucoma determined with archetypal analysis. J R Soc Interface 12:20141118. https://doi.org/10.1098/rsif.2014.1118
Article Google Scholar
Ernst NA, Bellomo S, Ozkaya I, Nord RL (2017) What to fix? Distinguishing between design and non-design rules in automated tools, in: 2017 IEEE international conference on software architecture (ICSA). Presented at the 2017 IEEE international conference on software architecture (ICSA), pp. 165–168. https://doi.org/10.1109/ICSA.2017.25
Eugster MJA (2012) Performance profiles based on archetypal athletes. Int J Perform Anal Sport 12:166–187. https://doi.org/10.1080/24748668.2012.11868592
Article Google Scholar
Fernández-Sánchez C, Humanes H, Garbajosa J, Díaz J (2017). An open tool for assisting in technical debt management, in: 2017 43rd Euromicro conference on software engineering and advanced applications (SEAA). Presented at the 2017 43rd Euromicro conference on software engineering and advanced applications (SEAA), pp. 400–403. https://doi.org/10.1109/SEAA.2017.60
Ferreira KAM, Bigonha MAS, Bigonha RS, Mendes LFO, Almeida HC (2012) Identifying thresholds for object-oriented software metrics. J. Syst. Softw., special issue with selected papers from the 23rd Brazilian symposium on software engineering 85, 244–257. https://doi.org/10.1016/j.jss.2011.05.044
Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76:378–382. https://doi.org/10.1037/h0031619
Article Google Scholar
Foganholi LB, Garcia RE, Eler DM, Correia RCM, Junior CO (2015) Supporting technical debt cataloging with TD-tracker tool. Adv soft Eng 2015, 4:4–4:4. https://doi.org/10.1155/2015/898514
Fontana FA, Roveda R, Vittori S, Metelli A, Saldarini S, Mazzei F (2016a) On Evaluating the Impact of the Refactoring of Architectural Problems on Software Quality, in: Proceedings of the Scientific Workshop Proceedings of XP2016, XP ‘16 Workshops. ACM, New York, NY, USA, pp. 21:1–21:8. https://doi.org/10.1145/2962695.2962716
Fontana FA, Roveda R, Zanoni M (2016b) Technical debt indexes provided by tools: a preliminary discussion, in: 2016 IEEE 8th international workshop on managing technical debt (MTD). Presented at the 2016 IEEE 8th international workshop on managing technical debt (MTD), pp. 28–31. https://doi.org/10.1109/MTD.2016.11
Griffith I, Reimanis D, Izurieta C, Codabux Z, Deo A, Williams B (2014) The correspondence between software quality models and technical debt estimation approaches, in: 2014 sixth international workshop on managing technical debt. Presented at the 2014 sixth international workshop on managing technical debt, pp. 19–26. https://doi.org/10.1109/MTD.2014.13
Gwet KL (2014) Handbook of inter-rater reliability: the definitive guide to measuring the extent of agreement among raters, 4 edition. Ed. advanced analytics, LLC, Gaithersburg, MD
Holvitie J, Leppänen V (2013) DebtFlag: technical debt management with a development environment integrated tool, in: proceedings of the 4th international workshop on managing technical debt, MTD ‘13. IEEE Press, Piscataway, pp 20–27
Google Scholar
Izurieta C, Vetrò A, Zazworka N, Cai Y, Seaman C, Shull F (2012) Organizing the technical debt landscape, in: 2012 third international workshop on managing technical debt (MTD). Presented at the 2012 third international workshop on managing technical debt (MTD), pp. 23–26. https://doi.org/10.1109/MTD.2012.6225995
Kazman R, Cai Y, Mo R, Feng Q, Xiao L, Haziyev S, Fedak V, Shapochka A (2015) A case study in locating the architectural roots of technical debt, in: 2015 IEEE/ACM 37th IEEE international conference on software engineering. Presented at the 2015 IEEE/ACM 37th IEEE international conference on software engineering, pp. 179–188. https://doi.org/10.1109/ICSE.2015.146
Kendall MG (1948) Rank correlation methods, Rank correlation methods. Griffin, Oxford, England
Kosti MV, Feldt R, Angelis L (2016) Archetypal personalities of software engineers and their work preferences: a new perspective for empirical studies. Empir Softw Eng 21:1509–1532. https://doi.org/10.1007/s10664-015-9395-3
Article Google Scholar
Li S, Wang P, Louviere J, Carson R (2003) Archetypal analysis: a new way to segment markets based on extreme individuals 6
Li Z, Avgeriou P, Liang P (2015) A systematic mapping study on technical debt and its management. J Syst Softw 101:193–220. https://doi.org/10.1016/j.jss.2014.12.027
Article Google Scholar
Marinescu R (2012) Assessing technical debt by identifying design flaws in software systems. IBM J. res. Dev. 56:9:1–9:13. https://doi.org/10.1147/JRD.2012.2204512
Article Google Scholar
Martín-Fernández JA, Barceló-Vidal C, Pawlowsky-Glahn V (2003) Dealing with Zeros and missing values in compositional data sets using nonparametric imputation. Math Geol 35:253–278. https://doi.org/10.1023/A:1023866030544
Article MATH Google Scholar
Martini A, Bosch J (2016) An empirically developed method to aid decisions on architectural technical debt refactoring: AnaConDebt, in: 2016 IEEE/ACM 38th international conference on software engineering companion (ICSE-C). Presented at the 2016 IEEE/ACM 38th international conference on software engineering companion (ICSE-C), pp. 31–40
Mayr A, Plösch R, Körner C (2014) A benchmarking-based model for technical debt calculation, in: 2014 14th international conference on quality software. Presented at the 2014 14th international conference on quality software, pp. 305–314. https://doi.org/10.1109/QSIC.2014.35
Mendes TS, Gomes FGS, Gonçalves DP, Mendonça MG, Novais RL, Spínola RO (2019) VisminerTD: a tool for automatic identification and interactive monitoring of the evolution of technical debt items. J Braz Comput Soc 25:2. https://doi.org/10.1186/s13173-018-0083-1
Article Google Scholar
Mittas N, Angelis L (2020) Data-driven benchmarking in software development effort estimation: The few define the bulk J Softw Evol Process n/a, e2258. https://doi.org/10.1002/smr.2258
Mittas N, Karpenisi V, Angelis L (2014) Benchmarking effort estimation models using archetypal analysis, in: proceedings of the 10th international conference on predictive models in software engineering, PROMISE ‘14. ACM, New York, pp 62–71. https://doi.org/10.1145/2639490.2639502
Book Google Scholar
Moliner J, Epifanio I (2019) Robust multivariate and functional archetypal analysis with application to financial time series analysis. Phys Stat Mech Its Appl 519:195–208. https://doi.org/10.1016/j.physa.2018.12.036
Article MathSciNet Google Scholar
Mori A, Vale G, Viggiato M, Oliveira J, Figueiredo E, Cirilo E, Jamshidi P, Kastner C (2018) Evaluating domain-specific metric thresholds: an empirical study, in: 2018 IEEE/ACM international conference on technical debt (TechDebt). Presented at the 2018 IEEE/ACM international conference on technical debt (TechDebt), pp. 41–50
Nayebi M, Cai Y, Kazman R, Ruhe G, Feng Q, Carlson C, Chew F (2019) A longitudinal study of identifying and paying down architecture debt, in: 2019 IEEE/ACM 41st international conference on software engineering: software engineering in practice (ICSE-SEIP). Presented at the 2019 IEEE/ACM 41st international conference on software engineering: software engineering in practice (ICSE-SEIP), pp. 171–180. https://doi.org/10.1109/ICSE-SEIP.2019.00026
Nugroho A, Visser J, Kuipers T (2011) An empirical model of technical debt and interest, in: proceedings of the 2nd workshop on managing technical debt, MTD ‘11. Association for Computing Machinery, Waikiki, Honolulu, HI, USA, pp. 1–8. https://doi.org/10.1145/1985362.1985364
Oliveira P, Lima FP, Valente MT, Serebrenik A (2014a) RTTool: a tool for extracting relative thresholds for source code metrics, in: 2014 IEEE international conference on software maintenance and evolution. Pp. 629–632. https://doi.org/10.1109/ICSME.2014.112
Oliveira P, Valente MT, Lima FP (2014b) Extracting relative thresholds for source code metrics, in: 2014 software evolution week - IEEE conference on software maintenance, reengineering, and reverse engineering (CSMR-WCRE). Presented at the 2014 software evolution week - IEEE conference on software maintenance, reengineering, and reverse engineering (CSMR-WCRE), pp. 254–263. https://doi.org/10.1109/CSMR-WCRE.2014.6747177
Pawlowsky-Glahn V, Egozcue JJ (2001) Geometric approach to statistical analysis on the simplex. Stoch Environ Res Risk Assess 15:384–398. https://doi.org/10.1007/s004770100077
Article MATH Google Scholar
Pearson CS (2015) Awakening the heroes within: twelve archetypes to help us find ourselves and transform our world, First edn. First Pinting edition. ed. HarperOne, San Francisco
Pinheiro J, Bates D (2000) Mixed-effects models in S and S-PLUS, statistics and computing. Springer-Verlag, New York
Book Google Scholar
Porzio GC, Ragozini G, Vistocco D (2008) On the use of archetypes as benchmarks. Appl Stoch Models Bus Ind 24:419–437. https://doi.org/10.1002/asmb.727
Article MathSciNet MATH Google Scholar
Runeson P, Höst M (2008) Guidelines for conducting and reporting case study research in software engineering. Empir Softw Eng 14:131. https://doi.org/10.1007/s10664-008-9102-8
Article Google Scholar
Sadowski C, Gogh J van, Jaspan C, Söderberg E, Winter C (2015) Tricorder: building a program analysis ecosystem, in: 2015 IEEE/ACM 37th IEEE international conference on software engineering. Presented at the 2015 IEEE/ACM 37th IEEE international conference on software engineering, pp. 598–608. https://doi.org/10.1109/ICSE.2015.76
Salkind NJ (ed) (2010) Encyclopedia of research design, 1, edition. edn. SAGE Publications Inc, Thousand Oaks
Schmidt RC (1997) Managing Delphi surveys using nonparametric statistical techniques*. Decis Sci 28:763–774. https://doi.org/10.1111/j.1540-5915.1997.tb01330.x
Article Google Scholar
Scott WA (1955) Reliability of content analysis: the case of nominal scale coding. Public Opin Q 19:321–325
Article Google Scholar
Seiler C, Wohlrabe K (2013) Archetypal scientists. J Inf Secur 7:345–356. https://doi.org/10.1016/j.joi.2012.11.013
Article Google Scholar
van Solingen R, Basili V, Caldiera G, Rombach HD (2002) Goal question metric (GQM) approach, in: encyclopedia of software engineering. Am Cancer Soc. https://doi.org/10.1002/0471028959.sof142
Thøgersen JC, Mørup M, Damkiær S, Molin S, Jelsbak L (2013) Archetypal analysis of diverse Pseudomonas aeruginosatranscriptomes reveals adaptation in cystic fibrosis airways. BMC Bioinformatics 14:279. https://doi.org/10.1186/1471-2105-14-279
Article Google Scholar
Tornhill A (2018) Assessing technical debt in automated tests with CodeScene, in: 2018 IEEE international conference on software testing, verification and validation workshops (ICSTW). Presented at the 2018 IEEE international conference on software testing, verification and validation workshops (ICSTW), pp. 122–125. https://doi.org/10.1109/ICSTW.2018.00039
Tsanousa A, Laskaris N, Angelis L (2015) A novel single-trial methodology for studying brain response variability based on archetypal analysis. Expert Syst Appl 42:8454–8462. https://doi.org/10.1016/j.eswa.2015.06.058
Article Google Scholar
Veado L, Vale G, Fernandes E, Figueiredo E (2016) TDTool: threshold derivation tool, in: proceedings of the 20th international conference on evaluation and assessment in software engineering, EASE ‘16. ACM, New York, NY, USA, pp. 24:1–24:5. https://doi.org/10.1145/2915970.2916014
Watson PF, Petrie A (2010) Method agreement analysis: a review of correct methodology. Theriogenology 73:1167–1179. https://doi.org/10.1016/j.theriogenology.2010.01.003
Article Google Scholar
Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering: an introduction, International Series in Software Engineering. Springer US
Xiao L, Cai Y, Kazman R (2014a) Titan: a toolset that connects software architecture with quality analysis, in: proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, FSE 2014. Association for Computing Machinery, Hong Kong, pp 763–766. https://doi.org/10.1145/2635868.2661677
Book Google Scholar
Xiao L, Cai Y, Kazman R (2014b) Design rule spaces: a new form of architecture insight, in: proceedings of the 36th international conference on software engineering, ICSE 2014. Association for Computing Machinery, Hyderabad, pp 967–977. https://doi.org/10.1145/2568225.2568241
Book Google Scholar
Yamashita A (2015) Experiences from performing software quality evaluations via combining benchmark-based metrics analysis, software visualization, and expert assessment, in: 2015 IEEE international conference on software maintenance and evolution (ICSME). Presented at the 2015 IEEE international conference on software maintenance and evolution (ICSME), pp. 421–428. https://doi.org/10.1109/ICSM.2015.7332493
Zazworka N, Vetro’ A, Izurieta C, Wong S, Cai Y, Seaman C, Shull F (2014) Comparing four approaches for technical debt identification. Softw Qual J 22:403–426. https://doi.org/10.1007/s11219-013-9200-8
Article Google Scholar
Zuur A, Ieno EN, Walker N, Saveliev AA, Smith GM (2009) Mixed effects models and extensions in ecology with R, statistics for biology and health. Springer-Verlag, New York https://doi.org/10.1007/978-0-387-87458-6

Download references

Acknowledgements

This research is funded by the University of Macedonia Research Committee as part of the “Principal Research 2019” funding program.

Author information

Authors and Affiliations

Department of Applied Informatics, University of Macedonia, Thessaloniki, Greece
Theodoros Amanatidis, Athanasia Moschou, Alexander Chatzigeorgiou & Apostolos Ampatzoglou
Department of Chemistry, International Hellenic University, Kavala, Greece
Nikolaos Mittas
Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
Lefteris Angelis

Authors

Theodoros Amanatidis
View author publications
You can also search for this author in PubMed Google Scholar
Nikolaos Mittas
View author publications
You can also search for this author in PubMed Google Scholar
Athanasia Moschou
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Chatzigeorgiou
View author publications
You can also search for this author in PubMed Google Scholar
Apostolos Ampatzoglou
View author publications
You can also search for this author in PubMed Google Scholar
Lefteris Angelis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikolaos Mittas.

Additional information

Communicated by: Forrest Shull

This paper has been awarded the Empirical Software Engineering (EMSE) open science badge.

Appendix

In this Appendix we report a sensitivity analysis conducted to examine the variability of the classes belonging to the Max-Ruler archetype in terms of their a-coefficients. After finding the archetypes and expressing the original data points with their a-coefficients, the new transformed data are of special type. Specifically, the new variables are non-negative and their sum is fixed, equal to 1. These data are called compositional or proportional data and the new variables have an inherent correlation which raises new problems. Indeed, classical statistical methodologies should not be applied (Aitchison, 1982), since their principles are violated. Actually, there is a whole class of theories, methods and tools for analyzing such data, belonging to a special branch of statistics, the Compositional Data Analysis (CoDA) introduced by Aitchison (Aitchison, 1982). These methods have been used in the context of software engineering (Chatzipetrou et al., 2015; 2010).

Describing briefly, the sample space of the compositional data (a-coefficients in our case) is a simplex of the form $ {\sum}_{j=1}^k{a}_{ij}=1 $ with a_ij ≥ 0 and i = 1, …, n, as it has been already defined in Eq. (4) in the manuscript. In our study, we adopted some certain suggested CoDA methodologies and specifically the centered log-ratio transformation (clr) and an imputation method for zeros which cause problems in the analysis, known as the multiplicative replacement strategy (Martín-Fernández et al., 2003).

After this preprocessing, we evaluated for each Max-Ruler archetype and their corresponding classes, a global measure of spread proposed by Pawlowsky-Glahn and Egozcue (Pawlowsky-Glahn and Egozcue, 2001), namely the metric standard deviation (MSD) that can be computed via the following formula

$$ MSD=\sqrt{\frac{1}{k-1} MVAR(X)} $$

(6)

where $ MVAR(X)=\frac{1}{m-1}{\sum}_{i=1}^m{d}^2\left({\mathbf{x}}_m,\overline{\mathbf{x}}\right) $ represents the metric variance and m represents the number of classes for the Max-Ruler archetype.

Appendix Table 8 summarizes the MSD values for the set of the experiments conducted through sensitivity analysis following a similar approach to the analysis of percentages of high-TD classes presented above. The general intuition from the inspection of the estimated MSD values is that the spread of the classes increases as the value of the threshold a increases only for JavaScript projects, whereas the spread seems to be generally the same for Java projects. Indeed, the findings of the two LME models (beyond model vs. the model without the interaction term) indicated a statistically significant difference χ² = 39.786, p < 0.001, and thus, the interaction term should be retained in the final model. This practically means that the spread depends on the combination of the levels of the two examined factors (Threshold and Language). To this regard, the post-hoc analysis through Tukey’s HSD test did not reveal a statistically significant difference for any pair-wise comparison conducted on the levels of factor Threshold for Java projects. In contrast, there were noted statistically significant differences for specific levels of factor Threshold for JavaScript projects forming four overlapping homogenous groups that are A = {0.60, 0.65, 0.70}, B = {0.65, 0.70, 0.75}, C = {0.70, 0.75, 0.80} and D = {0.80, 0.85, 0.90}.

Table 8 Estimated mean MSD with 95% CI for each threshold value a (sensitivity analysis)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amanatidis, T., Mittas, N., Moschou, A. et al. Evaluating the agreement among technical debt measurement tools: building an empirical benchmark of technical debt liabilities. Empir Software Eng 25, 4161–4204 (2020). https://doi.org/10.1007/s10664-020-09869-w

Download citation

Published: 26 August 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10664-020-09869-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Evaluating the agreement among technical debt measurement tools: building an empirical benchmark of technical debt liabilities

Abstract

Access this article

Similar content being viewed by others

Exploring Technical Debt Tools: A Systematic Mapping Study

Chapter 4 Technical Debt Tracking: Current State of Practice: A Survey and Multiple Case Study in 15 Large Organizations

Understanding automated and human-based technical debt identification approaches-a two-phase study

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluating the agreement among technical debt measurement tools: building an empirical benchmark of technical debt liabilities

Abstract

Access this article

Similar content being viewed by others

Exploring Technical Debt Tools: A Systematic Mapping Study

Chapter 4 Technical Debt Tracking: Current State of Practice: A Survey and Multiple Case Study in 15 Large Organizations

Understanding automated and human-based technical debt identification approaches-a two-phase study

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation