Skip to main content
Log in

Automatically recommending components for issue reports using deep learning

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Today’s software development is typically driven by incremental changes made to software to implement a new functionality, fix a bug, or improve its performance and security. Each change request is often described as an issue. Recent studies suggest that a set of components (e.g., software modules) relevant to the resolution of an issue is one of the most important information provided with the issue that software engineers often rely on. However, assigning an issue to the correct component(s) is challenging, especially for large-scale projects which have up to hundreds of components. In this paper, we propose a predictive model which learns from historical issue reports and recommends the most relevant components for new issues. Our model uses Long Short-Term Memory, a deep learning technique, to automatically learn semantic features representing an issue report, and combines them with the traditional textual similarity features. An extensive evaluation on 142,025 issues from 11 large projects shows that our approach outperforms one common baseline, two state-of-the-art techniques, and six alternative techniques with an improvement of 16.70%–66.31% on average across all projects in predictive performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. We use “assigned to” to denote the identification of the relation between an issue and the set of components relevant to the resolution of that issue.

  2. https://moodle.org

  3. https://jira.atlassian.com/projects/JRACLOUD/summary

  4. https://www.atlassian.com/software/jira

  5. https://tracker.moodle.org/browse/MDL-56364

  6. https://www.bugzilla.org/

  7. The number of top k components is specified by the user.

  8. The model was implemented in Python using Theano (Team 2016).

  9. https://keras.io/

  10. https://code.google.com/archive/p/word2vec/

  11. We used an implementation of Doc2Vec in Gensim https://radimrehurek.com/gensim/models/doc2vec.html

References

  • Al-Kofahi JM, Tamrawi A, Nguyen TN (2010) Fuzzy set approach for automatic tagging in evolving software. In: Proceeding of the international conference on software maintenance (ICSM). https://doi.org/10.1109/ICSM.2010.5609751, pp 1–10

  • Alencar D, Abebe SL, Mcintosh S, Alencar da Costa D, Abebe SL, Mcintosh S, Kulesza U, Hassan AE (2014) An empirical study of delays in the integration of addressed issues. In: Proceedings of the international conference on software maintenance and evolution (ICSME), IEEE, pp 281–290

  • Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc YG (2008) Is it a bug or an enhancement?: a text-based approach to classify change requests. In: Proceedings of the conference of the center for advanced studies on collaborative research: meeting of minds, ACM. https://doi.org/10.1145/1463788.1463819, pp 304–318

  • Anvik J, Murphy GC (2011) Reducing the effort of bug report triage. ACM Trans Softw Eng Methodol 20(3):1–35

    Article  Google Scholar 

  • Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug?. In: Proceedings of the 28th international conference on software engineering (ICSE), ACM Press, New York, USA, pp 361–370

  • Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: Proceedings of the 33rd international conference on software engineering (ICSE). https://doi.org/10.1145/1985793.1985795, pp 1–10

  • Atzmueller M, Chin A, Scholz C, Trattner C (2015) Mining, modeling, and recommending ‘things’ in social media. Lect Notes Comput Sci 8940:55–74. https://doi.org/10.1007/978-3-319-14723-9

    Article  Google Scholar 

  • Baroni M, Dinu G, Kruszewski G (2014) Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In: ACL (1), pp 238–247

  • Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008a) What makes a good bug report?. In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering, ACM Press, New York, USA, pp 308–318

  • Bettenburg N, Premraj R, Zimmermann T (2008b) Duplicate bug reports considered harmful {…} really?. In: Proceedings of the international conference on software maintenance (ICSM), pp 337–345

  • Blei DM, Ng AY, Jordan MI (2012) Latent dirichlet allocation. J Mach Learn Res 3(4-5):993–1022

    MATH  Google Scholar 

  • Cherman EA, Monard MC, Metz J (2011) Multi-label problem transformation methods : a case study. CLEI Electron J 14(1):1–10

    Article  Google Scholar 

  • Choetkiertikul M, Dam KH, Tran T, Pham TTM, Ghose A (2018) Poster: predicting components for issue reports using deep. In: Proceedings of the 40th international conference on software engineering (ICSE) poster track, pp 244–245

  • Cottrell R, Walker RJ, Denzinger J (2008) Semi-automating small-scale source code reuse via structural correspondence. Science 214–225. https://doi.org/10.1145/1453101.1453130

  • Cubranic D, Murphy G (2004) Automatic bug triage using text categorization. In: Proceedings of the 16th international conference on software engineering & knowledge engineering (SEKE), pp 92–97

  • Dam H, Tran T, Pham T (2016) A deep language model for software code. arXiv:1608.02715 (August):1–4

  • Denninger O (2012) Recommending relevant code artifacts for change requests using multiple predictors. In: Proceeding of the 3rd International Workshop on Recommendation Systems for Software Engineering (RSSE). https://doi.org/10.1109/RSSE.2012.6233416, pp 78–79

  • Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. Advances in Neural Information Processing Systems 14:681–687

    Google Scholar 

  • Fu W, Menzies T (2017) Easy over hard: a case study on deep learning. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, Association for Computing Machinery, New York, NY, USA, ESEC/FSE 2017. https://doi.org/10.1145/3106237.3106256, pp 49–60

  • Gasparic M, Janes A (2016) What recommendation systems for software engineering recommend: a systematic literature review. J Syst Softw 113:101–113. https://doi.org/10.1016/j.jss.2015.11.036

    Article  Google Scholar 

  • Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with lstm. Neural Comput 12(10):2451–2471

    Article  Google Scholar 

  • Glasmachers T (2017) Limits of end-to-end learning. In: Proceeding of the 9th asian conference on machine learning, pp 17–32

  • Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2013, IEEE, pp 6645–6649

  • Gu X, Zhang H, Zhang D, Kim S (2016) Deep API learning. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, ACM, FSE 2016, pp 631–642

  • Gutmann MU, Hyvärinen A (2012) Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. J Mach Learn Res 13:307–361

    MathSciNet  MATH  Google Scholar 

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies

  • Hu H, Zhang H, Xuan J, Sun W (2014) Effective bug triage based on historical bug-fix information. In: Proceedings of the international Symposium on Software Reliability Engineering (ISSRE). https://doi.org/10.1109/ISSRE.2014.17, pp 122–132

  • Iqbal A (2014) Understanding contributor to developer turnover patterns in oss projects: a case study of apache projects. ISRN Softw Eng 2014:1–10. https://doi.org/10.1155/2014/535724

    Article  Google Scholar 

  • Jalbert N, Weimer W (2008) Automated duplicate detection for bug tracking systems. In: Proceedings of the international conference on dependable systems and networks with FTCS and DCC (DSN), IEEE, pp 52–61

  • James ER (2002) Some implications of remedial and preventive legislation in the United States. Am J Sociol 18(6):769–783. https://doi.org/10.1086/212157,1603.06111

    Article  Google Scholar 

  • Jindal R, Malhotra R, Jain A (2017) Prediction of defect severity by mining software project reports. International Journal of System Assurance Engineering and Management 8(2):334–351. https://doi.org/10.1007/s13198-016-0438-y

    Google Scholar 

  • Johnson R, Zhang T (2015) Effective use of word order for text categorization with convolutional neural networks. In: NAACL HLT 2015 - 2015 conference of the North American chapter of the association for computational linguistics: human language technologies, proceedings of the conference, 2011, pp 103–112

  • Jones C (2004) Software project management practices : failure versus success. CrossTalk: The Journal of Defense Software Engineering 17(10):5–9

    Google Scholar 

  • Kakarontzas G, Stamelos I, Skalistis S, Naskos A (2012) Extracting components from open source: the component adaptation environment (COPE) approach. In: Proceedings of the 38th EUROMICRO conference on software engineering and advanced applications (SEAA). https://doi.org/10.1109/SEAA.2012.39, pp 192–199

  • Kerzner H, Kerzner HR (2017) Project management: a systems approach to planning, scheduling, and controlling. Wiley

  • Kochhar PS, Thung F, Lo D (2014) Automatic fine-grained issue report reclassification. In: Proceedings of the IEEE international conference on engineering of complex computer systems (ICECCS), pp 126–135

  • Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

    Article  MathSciNet  Google Scholar 

  • Kumari M, Singh VB (2020) An improved classifier based on entropy and deep learning for bug priority prediction. In: Intelligent systems design and applications, Springer International Publishing, pp 571–580

  • Lam AN, Nguyen AT, Nguyen HA, Nguyen TN (2016) Combining deep learning with information retrieval to localize buggy files for bug reports. In: Proceedings of the 30th IEEE/ACM international conference on automated software engineering (ASE). https://doi.org/10.1109/ASE.2015.73, pp 476–481

  • Lam AN, Nguyen AT, Nguyen HA, Nguyen TN (2017) Bug localization with combination of deep learning and information retrieval. In: Proceedings of the 25th IEEE/ACM international conference on program comprehension (ICPC). https://doi.org/10.1109/ICPC.2017.24, pp 218–229

  • Lamkanfi A, Demeyer S (2013) Predicting reassignments of bug reports - an exploratory investigation. In: Proceedings of the European conference on software maintenance and reengineering, CSMR. https://doi.org/10.1109/CSMR.2013.42, pp 327–330

  • Lamkanfi A, Demeyer S, Giger E, Goethals B (2010) Predicting the severity of a reported bug. In: Proceedings of the 7th IEEE working conference on mining software repositories (MSR), IEEE, pp 1–10

  • Lamkanfi A, Demeyer S, Soetens QD, Verdonckz T (2011) Comparing mining algorithms for predicting the severity of a reported bug. In: Proceedings of the European Conference on Software Maintenance and Reengineering (CSMR). https://doi.org/10.1109/CSMR.2011.31, pp 249–258

  • Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31st international conference on machine learning (ICML). https://doi.org/10.1145/2740908.2742760, vol 32, pp 1188–1196

  • Lederer AL, Prasad J (1992) Nine management guidelines for better cost estimating. Commun ACM 35(2):51–59

    Article  Google Scholar 

  • Lee SR, Heo MJ, Lee CG, Kim M, Jeong G (2017) Applying deep learning based automatic bug triager to industrial projects. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering - ESEC/FSE 2017, pp 926–931

  • Li L, Feng H, Zhuang W, Meng N, Ryder B (2017) CCLearner: a deep learning-based clone detection approach. In: IEEE international conference on software maintenance and evolution (ICSME ’17), pp 249–260

  • Linares-Vásquez M, McMillan C, Poshyvanyk D, Grechanik M (2014) On using machine learning to automatically classify software applications into domain categories. Empir Softw Eng 19:582–618. https://doi.org/10.1007/s10664-012-9230-z

    Article  Google Scholar 

  • Mani S, Sankaran A, Aralikatte R (2019) Deeptriage: exploring the effectiveness of deep learning for bug triaging. In: Proceedings of the ACM India joint international conference on data science and management of data - CoDS-COMAD ’19, pp 171–179

  • McCallum A, Nigam K (1998) A comparison of event models for naïve Bayes text classification. In: Proceedings of the AAAI-98 workshop on learning for text categorization, AAAI Press, pp 41–48

  • Menzies T, Marcus A (2008) Automated severity assessment of software defect reports. In: Proceedings of the international conference on software maintenance (ICSM), IEEE, pp 346–355

  • Mohammad F (2018) Is preprocessing of text really worth your time for toxic comment classification?. In: Proceedings of the International Conference on Artificial Intelligence (ICAI) 1(1):447–453. arXiv:1806.02908

  • Muller K (1989) Statistical power analysis for the behavioral sciences. Technometrics 31(4):499–500

    Article  Google Scholar 

  • Nam J, Kim J, Menci̇a EL, Gurevych I, Fu̇rnkranz J (2013) Large-scale multi-label text classification - revisiting neural networks. In: Machine learning and knowledge discovery in databases. ECML PKDD 2014. Lecture notes in computer science. arXiv:1312.5419, pp 437–452

  • Navarro-Almanza R, Juurez-Ramirez R, Licea G (2018) Towards supporting software engineering using deep learning: a case of software requirements classification. In: Proceedings - 2017 5th international conference in software engineering research and innovation, CONISOFT 2017 2018-Janua:116–120. https://doi.org/10.1109/CONISOFT.2017.00021

  • Nguyen AT, Nguyen TT, Al-Kofahi J, Nguyen HV, Nguyen TN (2011) A topic-based approach for narrowing the search space of buggy files from a bug report. In: Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE). https://doi.org/10.1109/ASE.2011.6100062, pp 263–272

  • Otoom AF, Al-shdaifat D, Hammad M, Abdallah EE (2016) Severity prediction of software bugs. In: Proceedings of the 7th international conference on information and communication systems (ICICS), pp 92–95

  • Pandey N, Sanyal DK, Hudait A, Sen A (2017) Automated classification of software issue reports using machine learning techniques: an empirical study. Innov Syst Softw Eng 13(4):279–297. https://doi.org/10.1007/s11334-017-0294-1

    Article  Google Scholar 

  • Park YJ, Tuzhilin A (2008) The long tail of recommender systems and how to leverage it. In: Proceedings of the 2008 ACM conference on Recommender systems - RecSys ’08, p 11

  • Project Management Institute Inc (2000) A guide to the project management body of knowledge (PMBOK guide). Project Management Institute https://doi.org/10.5860/CHOICE.34-1636,978-1-933890-51-7

  • Rahman MM, Ruhe G, Zimmermann T (2009) Optimized assignment of developers for fixing bugs an initial evaluation for eclipse projects. In: Proceedings of the 3rd international symposium on empirical software engineering and measurement, IEEE, pp 439–442

  • Robillard MP, Walker RJ, Zimmermann T (2010) Recommendation systems for software engineering. IEEE Softw 27(4):80–86. https://doi.org/10.1109/MS.2009.161

    Article  Google Scholar 

  • Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: Proceedings of the 29th international conference on software engineering (ICSE), IEEE, pp 499–510

  • Saha RK, Saha AK, Perry DE (2013) Toward understanding the causes of unanswered questions in software information sites: a case study of stack overflow. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. https://doi.org/10.1145/2491411.2494585, pp 663–666

  • Saini V, Farmahinifarahani F, Lu Y, Baldi P, Lopes CV (2018) Oreo: detection of clones in the twilight zone. In: Proceedings of the 2018 26th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering (ESEC/FSE ’18), ACM Press, pp 354–365

  • Sarro F, Petrozziello A, Harman M (2016) Multi-objective software effort estimation. In: Proceedings of the 38th international conference on software engineering (ICSE), pp 619–630

  • Schmidhuber J (2015) Deep Learning in neural networks: an overview. Neural Netw 61:85–117. https://doi.org/10.1016/j.neunet.2014.09.003,1404.7828

    Article  Google Scholar 

  • Somasundaram K, Murphy GC (2012) Automatic categorization of bug reports using latent Dirichlet allocation. In: Proceedings of the 5th India software engineering conference (ISEC). https://doi.org/10.1145/2134254.2134276, pp 125–130

  • Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958

    MathSciNet  MATH  Google Scholar 

  • Steck H (2010) Training and testing of recommender systems on data missing not at random. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining. https://doi.org/10.1145/1835804.1835895, pp 713–722

  • Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 26th IEEE/ACM international conference on automated software engineering (ASE), IEEE, pp 253–262

  • Sundermeyer M, Schlüter R, Ney H (2012) LSTM neural networks for language modeling. In: INTERSPEECH, pp 194–197

  • Sureka A (2012) Learning to classify bug reports into components. In: Proceedings of the 50th international conference on objects, models, components, patterns, Springer. https://doi.org/10.1007/978-3-642-30561-0_20, pp 288–303

  • Team TD (2016) Theano: a python framework for fast computation of mathematical expressions. arXiv:http:arxiv.org/abs/1605.0http://deeplearning.net/software/theano

  • Thung F, Lo D, Jiang L (2012) Automatic defect categorization. In: Proceedings of the working conference on reverse engineering (WCRE), pp 205–214

  • Tian Y, Lo D, Xia X, Sun C (2015) Automated prediction of bug report priority using multi-factor analysis. Empir Softw Eng 20(5):1354–1383

    Article  Google Scholar 

  • Vargas-Baldrich S, Linares-Vásquez M, Poshyvanyk D (2016) Automated tagging of software projects using bytecode and dependencies. In: Proceedings of the 30th IEEE/ACM international conference on automated software engineering (ASE). https://doi.org/10.1109/ASE.2015.38, pp 289–294

  • Vargha A, Delaney HD (2000) A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J Educ Behav Stat 25(2):101–132. https://doi.org/10.3102/10769986025002101

    Google Scholar 

  • Wang S, Lo D, Lawall J (2014a) Compositional vector space models for improved bug localization. In: Proceedings of the 30th international conference on software maintenance and evolution (ICSME). https://doi.org/10.1109/ICSME.2014.39, pp 171–180

  • Wang S, Lo D, Vasilescu B, Serebrenik A (2014b) Entagrec: an enhanced tag recommendation system for software information sites. In: International conference on software maintenance and evolution (ICSME ’14), pp 291–300

  • Wang S, Liu T, Tan L (2016) Automatically learning semantic features for defect prediction. In: Proceedings of the international conference on software engineering (ICSE). https://doi.org/10.1145/2884781.2884804, vol 14–22, pp 297–308

  • Wang T, Wang H, Yin G, Ling CX, Li X, Zou P (2014c) Tag recommendation for open source software. Front Comput Sci 8(1):69–82

  • Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on software engineering (ICSE), pp 461–470

  • White M, Vendome C, Linares-v M, Poshyvanyk D (2015) Toward deep learning software repositories. In: Proceedings of the 12th working conference on mining software repositories (MSR), pp 334–345

  • White M, Tufano M, Vendome C, Poshyvanyk D (2016) Deep learning code fragments for code clone detection. In: IEEE/ACM international conference on automated software engineering. https://doi.org/10.1145/2970276.2970326, pp 87–98

  • Xi S, Yao Y, Xiao X, Xu F, Lu J (2018) An effective approach for routing the bug reports to the right fixers. In: Proceedings of the tenth Asia-Pacific symposium on internetware - internetware ’18, pp 1–10

  • Xi SQ, Yao Y, Xiao XS, Xu F, Lv J (2019) Bug triaging based on tossing sequence modeling. J Comput Sci Technol 34(5):942–956. https://doi.org/10.1007/s11390-019-1953-5

    Article  Google Scholar 

  • Xia X, Lo D, Wang X, Zhou B (2013) Tag recommendation in software information sites. In: Proceedings of the 10th working conference on mining software repositories (MSR), Ieee. https://doi.org/10.1109/MSR.2013.6624040, pp 287–296

  • Xia X, Lo D, Wen M, Shihab E, Zhou B (2014) An empirical study of bug report field reassignment. In: Proceedings of the conference on software maintenance, reengineering, and reverse engineering, pp 174–183

  • Xia X, Lo D, Ding Y, Al-Kofahi JM, Nguyen TN, Wang X (2016) Improving automated bug triaging with specialized topic model. IEEE Trans Softw Eng 43(3):272–297. https://doi.org/10.1109/TSE.2016.2576454

    Article  Google Scholar 

  • Yan M, Zhang X, Yang D, Xu L, Kymer JD (2016) A component recommender for bug reports using discriminative probability latent semantic analysis. Inf Softw Technol 73:37–51

    Article  Google Scholar 

  • Yang X, Lo D, Xia X, Zhang Y, Sun J (2015) Deep learning for just-in-time defect prediction. In: Proceedings of the IEEE international conference on software quality, reliability and security (QRS), 1. https://doi.org/10.1109/QRS.2015.14, pp 17–26

  • Yin H, Cui B, Li J, Yao J, Chen C (2012) Challenging the long tail recommendation. Proceedings of the VLDB Endowment 5(9):896–907. http://dl.acm.org/citation.cfm?doid=2311906.2311916

    Article  Google Scholar 

  • Yin W, Kann K, Yu M, Schütze H (2017) Comparative study of CNN and RNN for natural language processing arXiv:http://arxiv.org/1702.01923

  • Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4694–4702

  • Zanoni M, Perin F, Fontana FA, Viscusi G (2014) Dual analysis for recommending developers to resolve bugs. Journal of Software: Evolution and Process 26(12):1172–1192

    Google Scholar 

  • Zhang M, Zhou Z, Member S (2006) Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng 18(10):1338–1351

    Article  Google Scholar 

  • Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed?. In: Proceedings of the 34th international conference on software engineering (ICSE). https://doi.org/10.1109/ICSE.2012.6227210, pp 14–24

  • Zhou P, Liu J, Yang Z, Zhou G (2017) Scalable tag recommendation for software information sites. In: SANER 2017 - 24th IEEE international conference on software analysis, evolution, and reengineering, IEEE, 1, pp 272–282

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Morakot Choetkiertikul.

Additional information

Communicated by: Bram Adams

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Choetkiertikul, M., Dam, H.K., Tran, T. et al. Automatically recommending components for issue reports using deep learning. Empir Software Eng 26, 14 (2021). https://doi.org/10.1007/s10664-020-09898-5

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-020-09898-5

Keywords

Navigation