Skip to main content
Log in

Experience report: investigating bug fixes in machine learning frameworks/libraries

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Machine learning (ML) techniques and algorithms have been successfully and widely used in various areas including software engineering tasks. Like other software projects, bugs are also common in ML projects and libraries. In order to more deeply understand the features related to bug fixing in ML projects, we conduct an empirical study with 939 bugs from five ML projects by manually examining the bug categories, fixing patterns, fixing scale, fixing duration, and types of maintenance. The results show that (1) there are commonly seven types of bugs in ML programs; (2) twelve fixing patterns are typically used to fix the bugs in ML programs; (3) 68.80% of the patches belong to micro-scale-fix and small-scale-fix; (4) 66.77% of the bugs in ML programs can be fixed within one month; (5) 45.90% of the bug fixes belong to corrective activity from the perspective of software maintenance. Moreover, we perform a questionnaire survey and send them to developers or users of ML projects to validate the results in our empirical study. The results of our empirical study are basically consistent with the feedback from developers. The findings from the empirical study provide useful guidance and insights for developers and users to effectively detect and fix bugs in ML projects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Cui Z L, Wang J. Distributed intelligent control system of the injection molding machine based on arm controller. In: Proceedings of IEEE International Conference on Computer Science and Automation Engineering. 2011, 339–342

  2. Menasalvas E, Gonzalo-Martin C. Challenges of Medical Text and Image Processing: Machine Learning Approaches. Switzerland: Springer, Cham, 2016

    Google Scholar 

  3. Subrahmanya N, Xu P, El-Bakry A, Reynolds C. Advanced Machine Learning Methods for Production Data Pattern Recognition. Texas: Society of Petroleum Engineers, 2014

    Book  Google Scholar 

  4. Raedt L D, Guns T, Nijssen S. Constraint programming for data mining and machine learning. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence. 2010, 11–15

  5. Wang L, Sun X B, Wang J W, Duan Y C, Li B. Construct bug knowledge graph for bug resolution: poster. In: Proceedings of the 39th IEEE/ACM International Conference on Software Engineering. 2017, 189–191

  6. Sun X B, Li B X, Leung H, Li B, Li Y. Msr4sm: using topic models to effectively mining software repositories for software maintenance tasks, Information and Software Technology, 2015, 66: 1–12

    Article  Google Scholar 

  7. Sun X B, Liu X Y, Li B, Duan Y C, Yang H, Hu J J. Exploring topic models in software engineering data analysis: a survey. In: Proceedings of IEEE/ACIS International Conference on Software Engineering. 2016, 357–362

  8. Yang H, Sun X B, Li B, Duan Y C. DR_PSF: enhancing developer recommendation by leveraging personalized source-code files. In: Proceedings of the 40th IEEE Annual Conference on Computer, Software and Applications. 2016, 239–244

  9. Xia X, Lo D, Ding Y, Al-Kofahi J M, Nguyen T N, Wang X Y. Improving automated bug triaging with specialized topic model. IEEE Transactions on Software Engineering, 2017, 43(3): 272–297

    Article  Google Scholar 

  10. Sun X B, Yang H, Xia X, Li B. Enhancing developer recommendation with supplementary information via mining historical commits. Journal of Systems and Software, 2017, 134: 355–368

    Article  Google Scholar 

  11. Huang Q, Xia X, Lo D. Supervised vs unsupervised models: a holistic look at effort-aware just-in-time defect prediction. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution. 2017, 159–170

  12. Yang Y B, Zhou Y M, Liu J P, Zhao Y Y, Lu H M, Xu L, Xu B W, Leung H. Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In: Proceedings of ACM Sigsoft International Symposium on Foundations of Software Engineering. 2016, 157–168

  13. Zhou T C, Sun X B, Xia X, Li B, Chen X. Improving defect prediction with deep forest. Information and Software Technology, 2019, 114: 204–216

    Article  Google Scholar 

  14. Jing X Y, Wu F, Dong X W, Qi F M, Xu B W. Heterogeneous cross-company defect prediction by unified metric representation and cca-based transfer learning. In: Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. 2015, 496–507

  15. Zhang F, Zheng Q, Zou Y, Hassan A E. Cross-project defect prediction using a connectivity-based unsupervised classifier. In: Proceedings of IEEE/ACM International Conference on Software Engineering. 2016, 309–320

  16. Sun X B, Peng X, Li B, Li B X, Wen W Z. IPSETFUL: an iterative process of selecting test cases for effective fault localization by exploring concept lattice of program spectra. Frontiers of Computer Science, 2016, 10(5): 812–831

    Article  Google Scholar 

  17. Xu Z G, Ma S Q, Zhang X Y, Zhu S F, Xu B W. Debugging with intelligence via probabilistic inference. In: Proceedings of the 40th International Conference on Software Engineering. 2018, 1171–1181

  18. Chappelly T, Cifuentes C, Krishnan P, Gevay S. Machine learning for finding bugs: an initial report. In: Proceedings of IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation. 2017, 21–26

  19. Helming J, Arndt H, Hodaie Z, Koegel M, Narayan N. Automatic assignment of work items. In: Proceedings of International Conference on Evaluation of Novel Approaches to Software Engineering. 2010, 236–250

  20. Liu C, Yang J Q, Tan L, Hafiz M. R2fix: automatically generating bug fixes from bug reports. In: Proceedings of IEEE International Conference on Software Testing, Verification and Validation. 2013, 282–291

  21. Thung F, Lo D, Jiang L X. Automatic recovery of root causes from bug-fixing changes. In: Proceedings of Working Conference on Reverse Engineering. 2013, 92–101

  22. Anvik J, Murphy G C. Reducing the effort of bug report triage: recommenders for development-oriented decisions. ACM Transactions on Software Engineering and Methodology, 2011, 20(3): 1–35

    Article  Google Scholar 

  23. Hellendoorn V J, Devanbu P T. Are deep neural networks the best choice for modeling source code? In: Proceedings of the 11th Joint Meeting on Foundations of Software Engineering. 2017, 763–773

  24. Yang G, Zhang T, Lee B. Utilizing a multi-developer network-based developer recommendation algorithm to fix bugs effectively. In: Proceedings of ACM Symposium on Applied Computing. 2014, 1134–1139

  25. Gu X D, Zhang H Y, Kim S H. Deep code search. In: Proceedings of the 40th International Conference on Software Engineering. 2018, 933–944

  26. Reungsinkonkarn A, Apirukvorapinit P. Bug detection using particle swarm optimization with search space reduction. In: Proceedings of International Conference on Intelligent Systems, Modelling and Simulation. 2015, 53–57

  27. Liu H, Xu Z F, Zou Y Z. Deep learning based feature envy detection. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 2018, 385–396

  28. Hu X, Li G, Xia X, Lo D, Jin Z. Deep code comment generation. In: Proceedings of the 26th Conference on Program Comprehension. 2018, 200–210

  29. Ni Z, Li B, Sun X B, Chen T H, Tang B, Shi X C. Analyzing bug fix for automatic bug cause classification. Journal of Systems and Software, 2020, 163: 110538

    Article  Google Scholar 

  30. Guo J, Cheng J H, Cleland-Huang J. Semantically enhanced software traceability using deep learning techniques. In: Proceedings of the 39th International Conference on Software Engineering. 2017, 3–14

  31. Liu Z X, Xia X, Hassan A E, Lo D, Xing Z C, Wang X Y. Neural-machine-translation-based commit message generation: how far are we? In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 2018, 373–384

  32. Hu J J, Sun X B, Lo D, Li B. Modeling the evolution of development topics using dynamic topic models. In: Proceedings of IEEE International Conference on Software Analysis, Evolution and Reengineering. 2015, 3–12

  33. Sun Y C, Wu M, Ruan W J, Huang X W, Kwiatkowska M, Kroening D. Concolic testing for deep neural networks. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 2018, 109–119

  34. Ma L, Liu Y, Zhao J J, Wang Y D, Xu F J, Zhang F Y, Sun J Y, Xue M H, Li B, Chen C Y, Su T, Li L. Deepgauge: multi-granularity testing criteria for deep learning systems. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 2018, 120–131

  35. Thung F, Wang S W, Lo D, Jiang L X. An empirical study of bugs in machine learning systems. In: Proceedings of IEEE International Symposium on Software Reliability Engineering. 2012, 271–280

  36. Zhang Y H, Chen Y F, Cheung S C, Xiong Y F, Zhang L. An empirical study on tensorflow program bugs. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. 2018, 129–140

  37. Sun X B, Zhou T C, Li G J, Hu J J, Yang H, Li B. An empirical study on real bugs for machine learning programs. In: Proceedings of Asia-Pacific Software Engineering Conference. 2017, 348–357

  38. Wang L L, Li B X, Leung H. A new method to encode calling contexts with recursions. Science China Information Sciences, 2016, 59(5): 60–74

    Article  Google Scholar 

  39. Li B X, Wang L L, Leung H, Liu F. Profiling all paths: a new profiling technique for both cyclic and acyclic paths. Journal of Systems and Software, 2012, 85(7): 1558–1576

    Article  Google Scholar 

  40. Danicic S, Laurence M R. Static backward slicing of non-deterministic programs and systems. ACM Transactions on Programming Language and Systems, 2018, 40(3): 11

    Article  Google Scholar 

  41. Ufuktepe E, Tuglular T. A program slicing-based bayesian network model for change impact analysis. In: Proceedings of IEEE International Conference on Software Quality, Reliability and Security. 2018, 490–499

  42. Roy S, Pandey A, Dolan-Gavitt B, Hu Y. Bug synthesis: challenging bug-finding tools with deep faults. In: Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2018, 224–234

  43. Cheng D W, Cao C, Xu C, Ma X X. Manifesting bugs in machine learning code: an explorative study with mutation testing. In: Proceedings of IEEE International Conference on Software Quality, Reliability and Security. 2018, 313–324

  44. Bian P, Liang B, Shi W C, Huang J J, Cai Y. Nar-miner: discovering negative association rules from code for bug detection. In: Proceedings of ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2018, 411–422

  45. Wong C P, Meinicke J, Kästner C. Beyond testing configurable systems: applying variational execution to automatic program repair and higher order mutation testing. In: Proceedings of ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2018, 749–753

  46. Zhong H, Su Z D. An empirical study on real bug fixes. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering. 2015, 913–923

  47. Roychoudhury A, Xiong Y F. Automated program repair: a step towards software automation. Science China Information Science, 2019, 62(10): 47–49

    Google Scholar 

  48. Goues C L, Pradel M, Roychoudhury A. Automated program repair. Communications of the ACM, 2019, 62(12): 56–65

    Article  Google Scholar 

  49. Yuan Y, Banzhaf W. Toward better evolutionary program repair: an integrated approach. ACM Transactions Software Engineering and Methodology, 2020, 29(1): 5

    Article  Google Scholar 

  50. Jiang J J, Xiong Y F, Xia X. A manual inspection of defects4J bugs and its implications for automatic program repair. SCIENCE China Informaiton Sciences, 2019, 62(10): 200102

    Article  Google Scholar 

  51. Chapin N, Hale J E, Khan M D, Ramil J F, Tan W G. Types of software evolution and software maintenance. Journal of Software Maintenance, 2001, 13(1): 3–30

    Article  MATH  Google Scholar 

  52. Tan L, Liu C, Li Z M, Wang X H, Zhou Y Y, Zhai C X. Bug characteristics in open source software. Empirical Software Engineering, 2014, 19(6): 1665–1705

    Article  Google Scholar 

  53. Kong X L, Zhang L M, Wong E, Li B X. The impacts of techniques, programs and tests on automated program repair: an empirical study. Journal of Systems and Software, 2018, 137: 480–496

    Article  Google Scholar 

  54. Monperrus M. Automatic software repair: a bibliography. ACM Computer Survey, 2018, 51(1): 17

    Google Scholar 

  55. Witschey J, Zielinska O A, Welk A K, Murphy-Hill E R, Mayhorn C B, Zimmermann T. Quantifying developers’ adoption of security tools. In: Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. 2015, 260–271

  56. Lucia L, Thung F, Lo D, Jiang L X. Are faults localizable? In: Proceedings of the 9th IEEE Working Conference on Mining Software Repositories. 2012, 74–-77

  57. Tufano M, Watson C, Bavota G, Di Penta M, White M, Poshyvanyk D. An empirical investigation into learning bug-fixing patches in the wild via neural machine translation. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 2018, 832–837

  58. Lehman M M. On understanding laws, evolution, and conservation in the large-program life cycle. Journal of Systems and Software, 1980, 1: 213–221

    Article  Google Scholar 

  59. Lehman M M, Ramil J F. Software evolution and software evolution processes. Automated Software Engineering, 2002, 14(1–4): 275–309

    MATH  Google Scholar 

  60. Zhang J M, Harman M, Ma L, Liu Y. Machine learning testing: survey, landscapes and horizons. IEEE Transactions on Software Engineering, 2020

  61. Pei K X, Cao Y Z, Yang J F, Jana S. Deepxplore: automated whitebox testing of deep learning systems. In: Proceedings of the 26th Symposium on Operating Systems Principles. 2017, 1–18

  62. Papernot N, McDaniel P D, Goodfellow I J, Jha S, Celik Z B, Swami A. Practical black-box attacks against deep learning systems using adversarial examples. 2016, arXiv preprint arXiv:1602.02697

  63. Dwarakanath A, Ahuja M, Sikand S, Rao R M, Bose R P J C, Dubash N, Podder S. Identifying implementation bugs in machine learning based image classifiers using metamorphic testing. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. 2018, 118–128

  64. Zhang T Y, Gao C Y, Ma L, Lyu M R, Kim M. An empirical study of common challenges in developing deep learning applications. In: Proceedings of the 30th IEEE International Symposium on Software Reliability Engineering. 2019, 104–115

  65. Guo Q Y, Chen S, Xie X F, Ma L, Hu Q, Liu H T, Liu Y, Zhao J J, Li X H. An empirical study towards characterizing deep learning development and deployment across different frameworks and platforms. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering. 2019, 810–822

  66. Xie X F, Ma L, Wang H J, Li Y K, Liu Y, Li X H. Diffchaser: detecting disagreements for deep neural networks. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019, 5772–5778

  67. Motwani M, Sankaranarayanan S, Just R, Brun Y. Do automated program repair techniques repair hard and important bugs? Empirical Software Engineering, 2018, 23(5): 2901–2947

    Article  Google Scholar 

  68. Khatiwada S, Tushev M, Mahmoud A. Just enough semantics: an information theoretic approach for ir-based software bug localization. Information & Software Technology, 2018, 93: 45–57

    Article  Google Scholar 

  69. Youm K C, Ahn J, Lee E. Improved bug localization based on code change histories and bug reports. Information and Software Technology, 2017, 82: 177–192

    Article  Google Scholar 

  70. Wen M, Chen J J, Wu R X, Hao D, Cheung S C. Context-aware patch generation for better automated program repair. In: Proceedings of the 40th International Conference on Software Engineering. 2018, 1–11

  71. Wen M, Wu R X, Cheung S C. Locus: locating bugs from software changes. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 2016, 262–273

  72. Wang S W, Lo D. Amalgam+: composing rich information sources for accurate bug localization. Journal of Software: Evolution and Process, 2016, 28(10): 921–942

    Google Scholar 

  73. Zhou C, Li B, Sun X B. Improving software bug-specific named entity recognition with deep neural network. Journal of Systems and Software, 2020, 165: 110572

    Article  Google Scholar 

  74. Zhou C, Li B, Sun X B, Guo H J. Recognizing software bug-specific named entity in software bug repository. In: Proceedings of the 26th International Conference on Program Comprehension. 2018, 108–119

  75. Garcia J, Feng Y, Shen J J, Almanee S, Xia Y, Chen Q A. A comprehensive study of autonomous vehicle bugs. In: Proceedings of the 42nd International Conference on Software Engineering. 2020, 385–396

  76. Zhao Y Y, Leung H, Yang Y B, Zhou Y M, Xu B W. Towards an understanding of change types in bug fixing code. Information and Software Technology, 2017, 86: 37–53

    Article  Google Scholar 

  77. Zhong H, Meng N. Towards reusing hints from past fixes — an exploratory study on thousands of real samples. Empirical Software Engineering, 2018, 23(5): 2521–2549

    Article  Google Scholar 

  78. Campos E C, Maia M A. Common bug-fix patterns: a large-scale observational study. In: Proceedings of ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 2017, 404–413

  79. Yue R R, Meng N, Wang Q X. A characterization study of repeated bug fixes. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution. 2017, 422–432

  80. Soto M, Thung F, Wong C P, Goues C L, Lo D. A deeper look into bug fixes: patterns, replacements, deletions, and additions. In: Proceedings of the 13th International Workshop on Mining Software Repositories. 2016, 512–515

  81. Wan Z Y, Lo D, Xia X, Cai L. Bug characteristics in blockchain systems: a large-scale empirical study. In: Proceedings of the 14th International Conference on Mining Software Repositories. 2017, 413–424

  82. Sun X B, Peng X, Zhang K, Liu Y, Cai Y F. How security bugs are fixed and what can be improved: an empirical study with mozilla. Science China Information Sciences, 2019, 62(1): 19102

    Article  Google Scholar 

  83. Braiek H B, Khomh F. On testing machine learning programs. Journal of Systems and Software, 2020, 164: 110542

    Article  Google Scholar 

  84. Xie X F, Ma L, Xu F J, Xue M H, Chen H X, Liu Y, Zhao J J, Li B, Yin J X, See S. Deephunter: a coverage-guided fuzz testing framework for deep neural networks. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 2019, 146–157

  85. Chillarege R, Bhandari I S, Chaar J K, Halliday M J. Orthogonal defect classification-a concept for in-process measurements. IEEE Transactions on Software Engineering, 1992, 18(11): 943–956

    Article  Google Scholar 

  86. Li Z M, Tan L, Wang X H, Lu S, Zhou Y Y, Zhai C X. Have things changed now?: an empirical study of bug characteristics in modern open source software. In: Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability. 2006, 25–33

  87. Xia X, Zhou X Z, Lo D, Zhao X Q. An empirical study of bugs in software build systems. In: Proceedings of International Conference on Quality Software. 2013, 200–203

  88. Nayrolles M, Hamou-Lhadj A. Towards a classification of bugs to facilitate software maintainability tasks. In: Proceedings of the 1st International Workshop on Software Qualities and Their Dependencies. 2018, 25–32

  89. Hernández-González J, Rodríguez D, Inza I, Harrison R, Lozano J A. Learning to classify software defects from crowds: a novel approach. Applied Software Computing, 2018, 62: 579–591

    Article  Google Scholar 

  90. Hamill M, Goseva-Popstojanova K. Exploring fault types, detection activities, and failure severity in an evolving safety-critical software system. Software Quality Journal, 2015, 23(2): 229–265

    Article  Google Scholar 

  91. Silva N, Vieira M. Experience report: orthogonal classification of safety critical issues. In: Proceedings of the 25th IEEE International Symposium on Software Reliability Engineering. 2014, 156–166

Download references

Acknowledgements

Special thanks to the participants in our survey who provided useful feedback. This work was supported partially by the National Natural Science Foundation of China (Grant Nos. 61872312, 61972335, 61472344, 61611540347, 61402396 and 61662021), partially by the Open Funds of State Key Laboratory for Novel Software Technology of Nanjing University (KFKT2020B15 and KFKT2020B16), partially by the Jiangsu “333” Project, partially by the Six Talent Peaks Project in Jiangsu Province (RJFW-053), partially by the Natural Science Foundation of Jiangsu (BK20181353), partially by the Yangzhou city-Yangzhou University Science and Technology Cooperation Fund Project (YZU201803), by the CERNET Innovation Project (NGII20180607), and partially by the Yangzhou University Top-level Talents Support Program (2019).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rongcun Wang.

Additional information

Xiaobing Sun received his PhD degree from Southeast University, China in 2012. He is currently a professor in School of Information Engineering, Yangzhou University, China. His research interests include intelligent software engineering and software data analytics.

Tianchi Zhou received his bachelor degree from Yangzhou University, China in 2019. He is now pursuing his master degree in University of Chinese Academy of Sciences, China. His research interests include software defect prediction and natural language processing.

Rongcun Wang received his PhD degree from Huazhong University of Science and Technology, China in 2015. He is currently an associate professor in School of Computer Science and Technology, China University of Mining and Technology, China. His research interests include software testing, fault localization, and software maintenance.

Yucong Duan received his PhD degree from Institute of Software, Chinese Academy of Science, China in 2006. He is currently a professor and director of Data Science and Technology Department at Hainan University, China. His research interests include software modeling, knowledge engineering, artificial intelligence, etc.

Lili Bo received her PhD degree from China University of Mining and Technology, China in 2019. She is currently a lecturer in School of Information Engineering, Yangzhou University, China. Her current research interests include software testing and software security.

Jianming Chang is currently a student in Yangzhou University, China. His main research interest is bug localization.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, X., Zhou, T., Wang, R. et al. Experience report: investigating bug fixes in machine learning frameworks/libraries. Front. Comput. Sci. 15, 156212 (2021). https://doi.org/10.1007/s11704-020-9441-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-020-9441-1

Keywords

Navigation