Deriving a usage-independent software quality metric

Dey, Tapajit; Mockus, Audris

doi:10.1007/s10664-019-09791-w

Deriving a usage-independent software quality metric

Published: 19 February 2020

Volume 25, pages 1596–1641, (2020)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

730 Accesses
16 Citations
Explore all metrics

Abstract

Context

The extent of post-release use of software affects the number of faults, thus biasing quality metrics and adversely affecting associated decisions. The proprietary nature of usage data limited deeper exploration of this subject in the past.

Objective

To determine how software faults and software use are related and how, based on that, an accurate quality measure can be designed.

Method

Via Google Analytics we measure new users, usage intensity, usage frequency, exceptions, and release date and duration for complex proprietary mobile applications for Android and iOS. We utilize Bayesian Network and Random Forest models to explain the interrelationships and to derive the usage independent release quality measure. To increase external validity, we also investigate the interrelationship among various code complexity measures, usage (downloads), and number of issues for 520 NPM packages. We derived a usage-independent quality measure from these analyses, and applied it on 4430 popular NPM packages to construct timelines for comparing the perceived quality (number of issues) and our derived measure of quality during the lifetime of these packages.

Results

We found the number of new users to be the primary factor determining the number of exceptions, and found no direct link between the intensity and frequency of software usage and software faults. Crashes increased with the power of 1.02-1.04 of new user for the Android app and power of 1.6 for the iOS app. Release quality expressed as crashes per user was independent of other usage-related predictors, thus serving as a usage independent measure of software quality. Usage also affected quality in NPM, where downloads were strongly associated with numbers of issues, even after taking the other code complexity measures into consideration. Unlike in mobile case where exceptions per user decrease over time, for 45.8% of the NPM packages the number of issues per download increase.

Conclusions

We expect our result and our proposed quality measure will help gauge release quality of a software more accurately and inspire further research in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The significance of bug report elements

Article Open access 14 September 2020

What makes a code review useful to OpenDev developers? An empirical investigation

Article 22 November 2023

Do code review measures explain the incidence of post-release defects?

Article 29 June 2020

Notes

In fact, we mean to measure one aspect of the quality of software
https://support.avaya.com/products/P1574/avaya-equinox-for-android
https://support.avaya.com/products/P0949/avaya-onex-mobile-sip-for-ios
https://www.npmjs.com/package/escomplex#result-format
A generative model specifies a joint probability distribution over all observed variables, whereas a discriminative model (like the ones obtained from regression or decision trees) provides a model only for the target variable(s) conditional on the predictor variables. Thus, while a discriminative model allows only sampling of the target variables conditional on the predictors, a generative model can be used, for example, to simulate (i.e. generate) values of any variable in the model, and consequently, to gain an understanding of the underlying mechanics of a system, generative models are essential.
Hartemink’s pairwise mutual information method (Hartemink 2001).
One extra / missing / reversed edge

References

Abdalkareem R, Nourry O, Wehaibi S, Mujahid S, Shihab E (2017) Why do developers use trivial packages? An empirical case study on npm. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering. ACM, pp 385–395
Alain H, Buehlmann P (2012) Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. J Mach Learn Res 13:2409–2464. http://jmlr.org/papers/v13/hauser12a.html
MathSciNet MATH Google Scholar
Amreen S, Bichescu B, Bradley R, Dey T, Ma Y, Mockus A, Mousavi S, Zaretzki R (2019) A methodology for measuring FLOSS ecosystems. Springer, Singapore
Book Google Scholar
Balov N, Salzman P (2016) catnet: categorical Bayesian network inference. https://CRAN.R-project.org/package=catnet. R package version 1.15.0
Boehm BW, Brown JR, Lipow M (1976) Quantitative evaluation of software quality. In: Proceedings of the 2nd international conference on software engineering. IEEE Computer Society Press, pp 592–605
Borges H, Hora A, Valente MT (2016) Understanding the factors that impact the popularity of github repositories. In: 2016 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 334–344
Bottcher SG, Dethlefsen C (2013) deal: learning Bayesian networks with mixed variables. https://CRAN.R-project.org/package=deal. R package version 1.2-37
Briand LC, Wüst J, Daly JW, Porter DV (2000) Exploring the relationships between design measures and software quality in object-oriented systems. J Syst Software 51(3):245–273
Article Google Scholar
Chatzidimitriou KC, Papamichail MD, Diamantopoulos T, Tsapanos M, Symeonidis AL (2018) npm-miner: an infrastructure for measuring the quality of the npm registry. In: Proceedings of the 15th international conference on mining software repositories. ACM, pp 42–45
Chickering DM (1996) Learning bayesian networks is np-complete. Learning from data: Artificial intelligence and statistics V 112:121–130
Article MathSciNet Google Scholar
Chlebus BS, Nguyen SH (1998) On finding optimal discretizations for two attributes. In: International conference on rough sets and current trends in computing. Springer, pp 537–544
Dalal SR, Mallows CL (1988) When should one stop testing software? J Am Stat Assoc 83(403):872–879
Article MathSciNet Google Scholar
David (2014) https://developers.slashdot.org/story/17/01/14/0222245/nodejss-npm-is-now-the-largest-package-registry-in-the-world
Dey T, Mockus A (2018a) Are software dependency supply chain metrics useful in predicting change of popularity of npm packages?. In: Proceedings of the 14th international conference on predictive models and data analytics in software engineering (pp. 66–69). ACM
Dey T, Mockus A (2018b) Modeling relationship between post-release faults and usage in mobile software. In: Proceedings of the 14th international conference on predictive models and data analytics in software engineering. ACM, pp 56–65
Dey T, Ma Y, Mockus A (2019) Patterns of effort contribution and demand and user classification based on participation patterns in npm ecosystem. In: Proceedings of the fifteenth international conference on predictive models and data analytics in software engineering (pp. 36–45). ACM
Duc AN, Mockus A, Hackbarth R, Palframan J (2014) Forking and coordination in multi-platform development: a case study. In: ESEM, Torino, pp 59:1–59:10. http://dl.acm.org/authorize?N14215
Fenton N, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675–689
Article Google Scholar
Fenton N, Krause P, Neil M (2002) Software measurement: uncertainty and causal modeling. IEEE Softw 19(4):116–122
Article Google Scholar
Fenton N, Neil M, Marsh W, Hearty P, Marquez D, Krause P, Mishra R (2007) Predicting software defects in varying development lifecycles using Bayesian nets. Inf Softw Technol 49(1):32–43
Article Google Scholar
Fenton N, Neil M, Marquez D (2008) Using bayesian networks to predict software defects and reliability. Proceedings of the Institution of Mechanical Engineers Part O: Journal of Risk and Reliability 222(4):701–712
Google Scholar
Friedman N, Goldszmidt M, Wyner A (1999) Data analysis with Bayesian networks: a bootstrap approach. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc, pp 196–205
Geron T (2012) Do ios apps crash more than android apps? A data dive. https://www.forbes.com/sites/tomiogeron/2012/02/02/does-ios-crash-more-than-android-a-data-dive
Hackbarth R, Mockus A, Palframan J, Sethi R (2016a) Customer quality improvement of software systems. Softw IEEE 33(4):40–45. papers/cqm2.pdf
Article Google Scholar
Hackbarth R, Mockus A, Palframan J, Sethi R (2016b) Improving software quality as customers perceive it. IEEE Softw 33(4):40–45
Article Google Scholar
Hahsler M, Chelluboina S, Hornik K, Buchta C (2011) The arules r-package ecosystem: analyzing interesting patterns from large transaction datasets. J Mach Learn Res 12:1977–1981. http://jmlr.csail.mit.edu/papers/v12/hahsler11a.html
MathSciNet MATH Google Scholar
Hartemink AJ (2001) Principled computational methods for the validation and discovery of genetic regulatory networks. Ph.D. thesis Massachusetts Institute of Technology
Herbsleb JD, Mockus A (2003) An empirical study of speed and communication in globally-distributed software development. IEEE Trans Softw Eng 29(6):481–494. papers/delay.pdf
Article Google Scholar
Jones C (2011) Software quality in 2011: a survey of the state of the art. http://sqgne.org/presentations/2011-12/Jones-Sep-2011.pdf. President, Namcook Analytics LLC, www.Namcook.com Email: Capers.Jones3@GMAILcom
Kalisch M, Mächler M, Colombo D, Maathuis MH, Bühlmann P (2012) Causal inference using graphical models with the R package pcalg. J Stat Softw 47(11):1–26. http://www.jstatsoft.org/v47/i11/
Article Google Scholar
Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773. http://doi.ieeecomputersociety.org/10.1109/TSE.2012.70
Article Google Scholar
Kan SH (2002) Metrics and models in software quality engineering. Addison-Wesley Longman Publishing Co. Inc
Kenny GQ (1993) Estimating defects in commercial software during operational use. IEEE Trans Reliab 42(1):107–115
Article Google Scholar
Khomh F, Dhaliwal T, Zou Y, Adams B (2012) Do faster releases improve software quality?: An empirical case study of mozilla firefox. In: Proceedings of the 9th IEEE Working conference on mining software repositories. IEEE Press, pp 179–188
Kitchenham B, Pfleeger SL (1996) Software quality: the elusive target [special issues section]. IEEE Softw 13(1):12–21
Article Google Scholar
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT press
Kononenko O, Baysal O, Guerrouj L, Cao Y, Godfrey MW (2015) Investigating code review quality: do people and participation matter?. In: 2015 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 111–120
Li PL, Kivett R, Zhan Z, Jeon Se, Nagappan N, Murphy B, Ko AJ (2011) Characterizing the differences between pre-and post-release versions of software. In: Proceedings of the 33rd international conference on software engineering. ACM, pp 716–725
Scutari M (2010) Learning Bayesian networks with the bnlearn r package. J Stat Softw 35(3):1–22. http://www.jstatsoft.org/v35/i03/
Article Google Scholar
McIntosh S, Kamei Y, Adams B, Hassan AE (2014) The impact of code review coverage and code review participation on software quality: a case study of the qt, vtk, and itk projects. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 192–201
Mcintosh S, Kamei Y, Adams B, Hassan AE (2016) An empirical study of the impact of modern code review practices on software quality. Empirical Softw Engg 21(5):2146–2189. https://doi.org/10.1007/s10664-015-9381-9
Article Google Scholar
Mockus A (2007) Software support tools and experimental work. In: Basili V et al. (eds) Empirical software engineering issues: critical assessments and future directions, vol LNCS 4336. Springer, pp 91–99. papers/SSTaEW.pdf
Mockus A (2013) Law of minor release: more bugs implies better software quality. http://mockus.org/papers/IWPSE13.pdf. International Workshop on Principles of Software Evolution, St Petersburg, Russia, Aug 18-19 2013. Keynote
Mockus A (2014) Engineering big data solutions. In: ICSE’14 FOSE, pp 85–99. http://dl.acm.org/authorize?N14216
Mockus A, Weiss DM (2000) Predicting risk of software changes. Bell Labs Tech J 5(2):169–180. papers/bltj13.pdf
Article Google Scholar
Mockus A, Weiss D (2008a) Interval quality: relating customer-perceived quality to process quality. In: 2008 International conference on software engineering. ACM Press, Leipzig, pp 733–740. http://dl.acm.org/authorize?063910
Mockus A, Weiss D (2008b) Interval quality: relating customer-perceived quality to process quality. In: Proceedings of the 30th international conference on software engineering. ACM, pp 723–732
Mockus A, Zhang P, Li P (2005) Drivers for customer perceived software quality. In: ICSE 2005. ACM Press, St Louis, pp 225–233. http://dl.acm.org/authorize?860140
Mockus A, Zhang P, Li PL (2005) Predictors of customer perceived software quality. In: 27th International conference on software engineering, 2005. ICSE 2005. Proceedings. IEEE, pp 225–233
Mockus A, Hackbarth R, Palframan J (2013) Risky files: an approach to focus quality improvement effort. In: 9th Joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp 691–694. http://dl.acm.org/authorize?6845890
Motulsky H When is r squared negative? Cross validated. https://stats.stackexchange.com/q/12991 (version: 2014-05-06)
Nagarajan R, Scutari M, Lèbre S (2013) Bayesian networks in r, vol 122. Springer, pp 125–127
Neil M, Fenton N (1996) Predicting software quality using Bayesian belief networks. In: Proceedings of the 21st annual software engineering workshop. NASA Goddard Space Flight Centre, pp 217–230
Okutan A, Yıldız OT (2014) Software defect prediction using Bayesian networks. Empir Softw Eng 19(1):154–181
Article Google Scholar
Pai GJ, Dugan JB (2007) Empirical analysis of software fault content and fault proneness using Bayesian methods. IEEE Trans Softw Eng 33(10):675–686
Article Google Scholar
Pearl J (2011) Bayesian networks. Department of Statistics UCLA
Pendharkar PC, Subramanian GH, Rodger JA (2005) A probabilistic model for predicting software development effort. IEEE Trans Softw Eng 31(7):615–624
Article Google Scholar
Perez A, Larranaga P, Inza I (2006) Supervised classification with conditional gaussian networks: increasing the structure complexity from naive Bayes. Int J Approx Reason 43(1):1–25
Article MathSciNet Google Scholar
R Core Team (2017) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Rigby PC, Bird C (2013) Convergent contemporary software peer review practices. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, pp 202–212
Rotella P, Chulani S (2011) Implementing quality metrics and goals at the corporate level. In: Proceedings of the 8th working conference on mining software repositories. ACM, pp 113–122
Rubin J, Rinard M (2016) The challenges of staying together while moving fast: an exploratory study. In: Proceedings of the 38th international conference on software engineering. ACM, pp 982–993
Schulmeyer GG, McManus JI (1992) Handbook of software quality assurance. Van Nostrand Reinhold Co
Scutari M (2013) Learning Bayesian networks in r, an example in systems biology. http://www.bnlearn.com/about/slides/slides-useRconf13.pdf
Scutari M, Strimmer K (2010) Introduction to graphical modelling. arXiv:1005.1036
Shmueli G (2010) To explain or to predict? Stat Sci, 289–310
Article MathSciNet Google Scholar
Sober E (2002) Instrumentalism, parsimony, and the akaike framework. Philos Sci 69(S3):S112–S123
Article MathSciNet Google Scholar
Stamelos I, Angelis L, Dimou P, Sakellaris E (2003) On the use of Bayesian belief networks for the prediction of software productivity. Inf Softw Technol 45(1):51–60
Article Google Scholar
Subramanyam R, Krishnan MS (2003) Empirical analysis of ck metrics for object-oriented design complexity: implications for software defects. IEEE Transactions on software engineering 29(4):297–310
Article Google Scholar
Voss L (2014) Numeric precision matters: how npm download counts work. https://blog.npmjs.org/post/92574016600/numeric-precision-matters-how-npm-download-counts
Voss L (2018) The state of javascript frameworks, 2017. https://www.npmjs.com/npm/state-of-javascript-frameworks-2017-part-1
Wittern E, Suter P, Rajagopalan S (2016) A look at the dynamics of the javascript package ecosystem. In: 2016 IEEE/ACM 13th Working conference on mining software repositories (MSR). IEEE, pp 351–361
Yu P, Systa T, Muller H (2002) Predicting fault-proneness using oo metrics. an industrial case study. In: Sixth European conference on software maintenance and reengineering, 2002. Proceedings. IEEE, pp 99–107
Zerouali A, Constantinou E, Mens T, Robles G, González-Barahona J (2018) An empirical analysis of technical lag in npm package dependencies. In: International conference on software reuse. Springer, pp 95–110
Zhang F, Mockus A, Keivanloo I, Zou Y (2015) Towards building a universal defect prediction model with rank transformed predictors. Empir Softw Eng, 1–39
Zheng Q, Mockus A, Zhou M (2015) A method to identify and correct problematic software activity data: exploiting capacity constraints and data redundancies. In: ESEC/FSE’15. ACM, Bergamo, pp 637–648. http://dl.acm.org/authorize?N14200

Download references

Acknowledgements

This work was supported by the National Science Foundation (U.S.) under Grant No. 1633437 and Grant No. 1901102.

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, USA
Tapajit Dey & Audris Mockus

Authors

Tapajit Dey
View author publications
You can also search for this author in PubMed Google Scholar
Audris Mockus
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tapajit Dey.

Additional information

Communicated by: Shane McIntosh, Leandro L. Minku, Ayşe Tosun, and Burak Turhan

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Predictive Models and Data Analytics in Software Engineering (PROMISE)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dey, T., Mockus, A. Deriving a usage-independent software quality metric. Empir Software Eng 25, 1596–1641 (2020). https://doi.org/10.1007/s10664-019-09791-w

Download citation

Published: 19 February 2020
Issue Date: March 2020
DOI: https://doi.org/10.1007/s10664-019-09791-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deriving a usage-independent software quality metric