Skip to main content
Log in

Deriving a usage-independent software quality metric

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Context

The extent of post-release use of software affects the number of faults, thus biasing quality metrics and adversely affecting associated decisions. The proprietary nature of usage data limited deeper exploration of this subject in the past.

Objective

To determine how software faults and software use are related and how, based on that, an accurate quality measure can be designed.

Method

Via Google Analytics we measure new users, usage intensity, usage frequency, exceptions, and release date and duration for complex proprietary mobile applications for Android and iOS. We utilize Bayesian Network and Random Forest models to explain the interrelationships and to derive the usage independent release quality measure. To increase external validity, we also investigate the interrelationship among various code complexity measures, usage (downloads), and number of issues for 520 NPM packages. We derived a usage-independent quality measure from these analyses, and applied it on 4430 popular NPM packages to construct timelines for comparing the perceived quality (number of issues) and our derived measure of quality during the lifetime of these packages.

Results

We found the number of new users to be the primary factor determining the number of exceptions, and found no direct link between the intensity and frequency of software usage and software faults. Crashes increased with the power of 1.02-1.04 of new user for the Android app and power of 1.6 for the iOS app. Release quality expressed as crashes per user was independent of other usage-related predictors, thus serving as a usage independent measure of software quality. Usage also affected quality in NPM, where downloads were strongly associated with numbers of issues, even after taking the other code complexity measures into consideration. Unlike in mobile case where exceptions per user decrease over time, for 45.8% of the NPM packages the number of issues per download increase.

Conclusions

We expect our result and our proposed quality measure will help gauge release quality of a software more accurately and inspire further research in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27

Similar content being viewed by others

Notes

  1. In fact, we mean to measure one aspect of the quality of software

  2. https://support.avaya.com/products/P1574/avaya-equinox-for-android

  3. https://support.avaya.com/products/P0949/avaya-onex-mobile-sip-for-ios

  4. https://www.npmjs.com/package/escomplex#result-format

  5. A generative model specifies a joint probability distribution over all observed variables, whereas a discriminative model (like the ones obtained from regression or decision trees) provides a model only for the target variable(s) conditional on the predictor variables. Thus, while a discriminative model allows only sampling of the target variables conditional on the predictors, a generative model can be used, for example, to simulate (i.e. generate) values of any variable in the model, and consequently, to gain an understanding of the underlying mechanics of a system, generative models are essential.

  6. Hartemink’s pairwise mutual information method (Hartemink 2001).

  7. One extra / missing / reversed edge

References

  • Abdalkareem R, Nourry O, Wehaibi S, Mujahid S, Shihab E (2017) Why do developers use trivial packages? An empirical case study on npm. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering. ACM, pp 385–395

  • Alain H, Buehlmann P (2012) Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. J Mach Learn Res 13:2409–2464. http://jmlr.org/papers/v13/hauser12a.html

    MathSciNet  MATH  Google Scholar 

  • Amreen S, Bichescu B, Bradley R, Dey T, Ma Y, Mockus A, Mousavi S, Zaretzki R (2019) A methodology for measuring FLOSS ecosystems. Springer, Singapore

    Book  Google Scholar 

  • Balov N, Salzman P (2016) catnet: categorical Bayesian network inference. https://CRAN.R-project.org/package=catnet. R package version 1.15.0

  • Boehm BW, Brown JR, Lipow M (1976) Quantitative evaluation of software quality. In: Proceedings of the 2nd international conference on software engineering. IEEE Computer Society Press, pp 592–605

  • Borges H, Hora A, Valente MT (2016) Understanding the factors that impact the popularity of github repositories. In: 2016 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 334–344

  • Bottcher SG, Dethlefsen C (2013) deal: learning Bayesian networks with mixed variables. https://CRAN.R-project.org/package=deal. R package version 1.2-37

  • Briand LC, Wüst J, Daly JW, Porter DV (2000) Exploring the relationships between design measures and software quality in object-oriented systems. J Syst Software 51(3):245–273

    Article  Google Scholar 

  • Chatzidimitriou KC, Papamichail MD, Diamantopoulos T, Tsapanos M, Symeonidis AL (2018) npm-miner: an infrastructure for measuring the quality of the npm registry. In: Proceedings of the 15th international conference on mining software repositories. ACM, pp 42–45

  • Chickering DM (1996) Learning bayesian networks is np-complete. Learning from data: Artificial intelligence and statistics V 112:121–130

    Article  MathSciNet  Google Scholar 

  • Chlebus BS, Nguyen SH (1998) On finding optimal discretizations for two attributes. In: International conference on rough sets and current trends in computing. Springer, pp 537–544

  • Dalal SR, Mallows CL (1988) When should one stop testing software? J Am Stat Assoc 83(403):872–879

    Article  MathSciNet  Google Scholar 

  • David (2014) https://developers.slashdot.org/story/17/01/14/0222245/nodejss-npm-is-now-the-largest-package-registry-in-the-world

  • Dey T, Mockus A (2018a) Are software dependency supply chain metrics useful in predicting change of popularity of npm packages?. In: Proceedings of the 14th international conference on predictive models and data analytics in software engineering (pp. 66–69). ACM

  • Dey T, Mockus A (2018b) Modeling relationship between post-release faults and usage in mobile software. In: Proceedings of the 14th international conference on predictive models and data analytics in software engineering. ACM, pp 56–65

  • Dey T, Ma Y, Mockus A (2019) Patterns of effort contribution and demand and user classification based on participation patterns in npm ecosystem. In: Proceedings of the fifteenth international conference on predictive models and data analytics in software engineering (pp. 36–45). ACM

  • Duc AN, Mockus A, Hackbarth R, Palframan J (2014) Forking and coordination in multi-platform development: a case study. In: ESEM, Torino, pp 59:1–59:10. http://dl.acm.org/authorize?N14215

  • Fenton N, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675–689

    Article  Google Scholar 

  • Fenton N, Krause P, Neil M (2002) Software measurement: uncertainty and causal modeling. IEEE Softw 19(4):116–122

    Article  Google Scholar 

  • Fenton N, Neil M, Marsh W, Hearty P, Marquez D, Krause P, Mishra R (2007) Predicting software defects in varying development lifecycles using Bayesian nets. Inf Softw Technol 49(1):32–43

    Article  Google Scholar 

  • Fenton N, Neil M, Marquez D (2008) Using bayesian networks to predict software defects and reliability. Proceedings of the Institution of Mechanical Engineers Part O: Journal of Risk and Reliability 222(4):701–712

    Google Scholar 

  • Friedman N, Goldszmidt M, Wyner A (1999) Data analysis with Bayesian networks: a bootstrap approach. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc, pp 196–205

  • Geron T (2012) Do ios apps crash more than android apps? A data dive. https://www.forbes.com/sites/tomiogeron/2012/02/02/does-ios-crash-more-than-android-a-data-dive

  • Hackbarth R, Mockus A, Palframan J, Sethi R (2016a) Customer quality improvement of software systems. Softw IEEE 33(4):40–45. papers/cqm2.pdf

    Article  Google Scholar 

  • Hackbarth R, Mockus A, Palframan J, Sethi R (2016b) Improving software quality as customers perceive it. IEEE Softw 33(4):40–45

    Article  Google Scholar 

  • Hahsler M, Chelluboina S, Hornik K, Buchta C (2011) The arules r-package ecosystem: analyzing interesting patterns from large transaction datasets. J Mach Learn Res 12:1977–1981. http://jmlr.csail.mit.edu/papers/v12/hahsler11a.html

    MathSciNet  MATH  Google Scholar 

  • Hartemink AJ (2001) Principled computational methods for the validation and discovery of genetic regulatory networks. Ph.D. thesis Massachusetts Institute of Technology

  • Herbsleb JD, Mockus A (2003) An empirical study of speed and communication in globally-distributed software development. IEEE Trans Softw Eng 29(6):481–494. papers/delay.pdf

    Article  Google Scholar 

  • Jones C (2011) Software quality in 2011: a survey of the state of the art. http://sqgne.org/presentations/2011-12/Jones-Sep-2011.pdf. President, Namcook Analytics LLC, www.Namcook.com Email: Capers.Jones3@GMAILcom

  • Kalisch M, Mächler M, Colombo D, Maathuis MH, Bühlmann P (2012) Causal inference using graphical models with the R package pcalg. J Stat Softw 47(11):1–26. http://www.jstatsoft.org/v47/i11/

    Article  Google Scholar 

  • Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773. http://doi.ieeecomputersociety.org/10.1109/TSE.2012.70

    Article  Google Scholar 

  • Kan SH (2002) Metrics and models in software quality engineering. Addison-Wesley Longman Publishing Co. Inc

  • Kenny GQ (1993) Estimating defects in commercial software during operational use. IEEE Trans Reliab 42(1):107–115

    Article  Google Scholar 

  • Khomh F, Dhaliwal T, Zou Y, Adams B (2012) Do faster releases improve software quality?: An empirical case study of mozilla firefox. In: Proceedings of the 9th IEEE Working conference on mining software repositories. IEEE Press, pp 179–188

  • Kitchenham B, Pfleeger SL (1996) Software quality: the elusive target [special issues section]. IEEE Softw 13(1):12–21

    Article  Google Scholar 

  • Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT press

  • Kononenko O, Baysal O, Guerrouj L, Cao Y, Godfrey MW (2015) Investigating code review quality: do people and participation matter?. In: 2015 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 111–120

  • Li PL, Kivett R, Zhan Z, Jeon Se, Nagappan N, Murphy B, Ko AJ (2011) Characterizing the differences between pre-and post-release versions of software. In: Proceedings of the 33rd international conference on software engineering. ACM, pp 716–725

  • Scutari M (2010) Learning Bayesian networks with the bnlearn r package. J Stat Softw 35(3):1–22. http://www.jstatsoft.org/v35/i03/

    Article  Google Scholar 

  • McIntosh S, Kamei Y, Adams B, Hassan AE (2014) The impact of code review coverage and code review participation on software quality: a case study of the qt, vtk, and itk projects. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 192–201

  • Mcintosh S, Kamei Y, Adams B, Hassan AE (2016) An empirical study of the impact of modern code review practices on software quality. Empirical Softw Engg 21(5):2146–2189. https://doi.org/10.1007/s10664-015-9381-9

    Article  Google Scholar 

  • Mockus A (2007) Software support tools and experimental work. In: Basili V et al. (eds) Empirical software engineering issues: critical assessments and future directions, vol LNCS 4336. Springer, pp 91–99. papers/SSTaEW.pdf

  • Mockus A (2013) Law of minor release: more bugs implies better software quality. http://mockus.org/papers/IWPSE13.pdf. International Workshop on Principles of Software Evolution, St Petersburg, Russia, Aug 18-19 2013. Keynote

  • Mockus A (2014) Engineering big data solutions. In: ICSE’14 FOSE, pp 85–99. http://dl.acm.org/authorize?N14216

  • Mockus A, Weiss DM (2000) Predicting risk of software changes. Bell Labs Tech J 5(2):169–180. papers/bltj13.pdf

    Article  Google Scholar 

  • Mockus A, Weiss D (2008a) Interval quality: relating customer-perceived quality to process quality. In: 2008 International conference on software engineering. ACM Press, Leipzig, pp 733–740. http://dl.acm.org/authorize?063910

  • Mockus A, Weiss D (2008b) Interval quality: relating customer-perceived quality to process quality. In: Proceedings of the 30th international conference on software engineering. ACM, pp 723–732

  • Mockus A, Zhang P, Li P (2005) Drivers for customer perceived software quality. In: ICSE 2005. ACM Press, St Louis, pp 225–233. http://dl.acm.org/authorize?860140

  • Mockus A, Zhang P, Li PL (2005) Predictors of customer perceived software quality. In: 27th International conference on software engineering, 2005. ICSE 2005. Proceedings. IEEE, pp 225–233

  • Mockus A, Hackbarth R, Palframan J (2013) Risky files: an approach to focus quality improvement effort. In: 9th Joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp 691–694. http://dl.acm.org/authorize?6845890

  • Motulsky H When is r squared negative? Cross validated. https://stats.stackexchange.com/q/12991 (version: 2014-05-06)

  • Nagarajan R, Scutari M, Lèbre S (2013) Bayesian networks in r, vol 122. Springer, pp 125–127

  • Neil M, Fenton N (1996) Predicting software quality using Bayesian belief networks. In: Proceedings of the 21st annual software engineering workshop. NASA Goddard Space Flight Centre, pp 217–230

  • Okutan A, Yıldız OT (2014) Software defect prediction using Bayesian networks. Empir Softw Eng 19(1):154–181

    Article  Google Scholar 

  • Pai GJ, Dugan JB (2007) Empirical analysis of software fault content and fault proneness using Bayesian methods. IEEE Trans Softw Eng 33(10):675–686

    Article  Google Scholar 

  • Pearl J (2011) Bayesian networks. Department of Statistics UCLA

  • Pendharkar PC, Subramanian GH, Rodger JA (2005) A probabilistic model for predicting software development effort. IEEE Trans Softw Eng 31(7):615–624

    Article  Google Scholar 

  • Perez A, Larranaga P, Inza I (2006) Supervised classification with conditional gaussian networks: increasing the structure complexity from naive Bayes. Int J Approx Reason 43(1):1–25

    Article  MathSciNet  Google Scholar 

  • R Core Team (2017) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

  • Rigby PC, Bird C (2013) Convergent contemporary software peer review practices. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, pp 202–212

  • Rotella P, Chulani S (2011) Implementing quality metrics and goals at the corporate level. In: Proceedings of the 8th working conference on mining software repositories. ACM, pp 113–122

  • Rubin J, Rinard M (2016) The challenges of staying together while moving fast: an exploratory study. In: Proceedings of the 38th international conference on software engineering. ACM, pp 982–993

  • Schulmeyer GG, McManus JI (1992) Handbook of software quality assurance. Van Nostrand Reinhold Co

  • Scutari M (2013) Learning Bayesian networks in r, an example in systems biology. http://www.bnlearn.com/about/slides/slides-useRconf13.pdf

  • Scutari M, Strimmer K (2010) Introduction to graphical modelling. arXiv:1005.1036

  • Shmueli G (2010) To explain or to predict? Stat Sci, 289–310

    Article  MathSciNet  Google Scholar 

  • Sober E (2002) Instrumentalism, parsimony, and the akaike framework. Philos Sci 69(S3):S112–S123

    Article  MathSciNet  Google Scholar 

  • Stamelos I, Angelis L, Dimou P, Sakellaris E (2003) On the use of Bayesian belief networks for the prediction of software productivity. Inf Softw Technol 45(1):51–60

    Article  Google Scholar 

  • Subramanyam R, Krishnan MS (2003) Empirical analysis of ck metrics for object-oriented design complexity: implications for software defects. IEEE Transactions on software engineering 29(4):297–310

    Article  Google Scholar 

  • Voss L (2014) Numeric precision matters: how npm download counts work. https://blog.npmjs.org/post/92574016600/numeric-precision-matters-how-npm-download-counts

  • Voss L (2018) The state of javascript frameworks, 2017. https://www.npmjs.com/npm/state-of-javascript-frameworks-2017-part-1

  • Wittern E, Suter P, Rajagopalan S (2016) A look at the dynamics of the javascript package ecosystem. In: 2016 IEEE/ACM 13th Working conference on mining software repositories (MSR). IEEE, pp 351–361

  • Yu P, Systa T, Muller H (2002) Predicting fault-proneness using oo metrics. an industrial case study. In: Sixth European conference on software maintenance and reengineering, 2002. Proceedings. IEEE, pp 99–107

  • Zerouali A, Constantinou E, Mens T, Robles G, González-Barahona J (2018) An empirical analysis of technical lag in npm package dependencies. In: International conference on software reuse. Springer, pp 95–110

  • Zhang F, Mockus A, Keivanloo I, Zou Y (2015) Towards building a universal defect prediction model with rank transformed predictors. Empir Softw Eng, 1–39

  • Zheng Q, Mockus A, Zhou M (2015) A method to identify and correct problematic software activity data: exploiting capacity constraints and data redundancies. In: ESEC/FSE’15. ACM, Bergamo, pp 637–648. http://dl.acm.org/authorize?N14200

Download references

Acknowledgements

This work was supported by the National Science Foundation (U.S.) under Grant No. 1633437 and Grant No. 1901102.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tapajit Dey.

Additional information

Communicated by: Shane McIntosh, Leandro L. Minku, Ayşe Tosun, and Burak Turhan

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Predictive Models and Data Analytics in Software Engineering (PROMISE)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dey, T., Mockus, A. Deriving a usage-independent software quality metric. Empir Software Eng 25, 1596–1641 (2020). https://doi.org/10.1007/s10664-019-09791-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-019-09791-w

Keywords

Navigation