Skip to main content
Log in

Predicting software defect type using concept-based classification

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Automatically predicting the defect type of a software defect from its description can significantly speed up and improve the software defect management process. A major challenge for the supervised learning based current approaches for this task is the need for labeled training data. Creating such data is an expensive and effort-intensive task requiring domain-specific expertise. In this paper, we propose to circumvent this problem by carrying out concept-based classification (CBC) of software defect reports with help of the Explicit Semantic Analysis (ESA) framework. We first create the concept-based representations of a software defect report and the defect types in the software defect classification scheme by projecting their textual descriptions into a concept-space spanned by the Wikipedia articles. Then, we compute the “semantic” similarity between these concept-based representations and assign the software defect type that has the highest similarity with the defect report. The proposed approach achieves accuracy comparable to the state-of-the-art semi-supervised and active learning approach for this task without requiring labeled training data. Additional advantages of the CBC approach are: (i) unlike the state-of-the-art, it does not need the source code used to fix a software defect, and (ii) it does not suffer from the class-imbalance problem faced by the supervised learning paradigm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Note that Table 1 and Table 8 contain only the introductory definition snippets from the classification schemes. Their detailed descriptions along with contextual information and examples are available in IBM (2013a, b) and IEEE (2009).

  2. The expert needs to refer to IBM(2013a, b) to get the detailed descriptions and understand the defect type classification scheme.

  3. https://www.wikipedia.org

  4. Following the ESA terminology, we use “a concept” and “a Wikipedia article” interchangeably.

  5. https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

  6. Available from https://dumps.wikimedia.org

  7. Notion of a stub-article in Wikipedia: https://en.wikipedia.org/wiki/Wikipedia:Stub

  8. https://issues.apache.org/jira/issues/

  9. Mahout, the machine learning library, https://mahout.apache.org

  10. Lucene, the search engine library https://lucene.apache.org/core

  11. OpenNLP, the natural language processing library https://opennlp.apache.org

  12. https://github.com/roundcube/roundcubemail/issues

  13. https://roundcube.net/about/

  14. https://en.wikipedia.org/wiki/Help:Wikitext

  15. https://en.wikipedia.org/wiki/Vandalism_on_Wikipedia#Fighting_vandalism

  16. https://en.wikipedia.org/wiki/Reliability_of_Wikipedia#Assessments

  17. https://archive.ics.uci.edu/ml/datasets/iris

References

  • Alenezi M, Magel K, Banitaan S (2013) Efficient bug triaging using text mining. Journal of Software 8(9):2185–2190

    Article  Google Scholar 

  • Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory, COLT ’92, pp 144–152. https://doi.org/10.1145/130385.130401

  • Bridge N, Miller C (1998) Orthogonal defect classification using defect data to improve software development. Software Quality 3(1):1–8

    Google Scholar 

  • Butcher M, Munro H, Kratschmer T (2002) Improving software testing via ODC: Three case studies. IBM Syst J 41(1):31–44

    Article  Google Scholar 

  • Carrozza G, Pietrantuono R, Russo S (2015) Defect analysis in mission-critical software systems: a detailed investigation. Journal of Software: Evolution and Process 27(1):22–49

    Article  Google Scholar 

  • Chawla NV, Japkowicz N, Kotcz A (2004) Edit: Special issue on learning from imbalanced data sets. SIGKDD Explorations Newsletter 6(1):1–6. 10.1145/1007730.1007733

    Article  Google Scholar 

  • Chillarege R (1996) Orthogonal defect classification. Handbook of Software Reliability Engineering, pp 359–399

  • Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46

    Article  Google Scholar 

  • Cortes C, Vapnik V (1995) Support vector machine. Mach. Learn. 20(3):273–297. https://doi.org/10.1007/BF00994018

    MATH  Google Scholar 

  • Čubranić D (2004) Automatic bug triage using text categorization. In: Proceedings of 16th international conference on software engineering & knowledge engineering (SEKE)

  • Egozi O, Markovitch S, Gabrilovich E (2011) Concept-based information retrieval using explicit semantic analysis. ACM Trans. Inf. Syst. 29(2):8

    Article  Google Scholar 

  • Ferschke O, Zesch T, Gurevych I (2011) Wikipedia revision toolkit: Efficiently accessing Wikipedia’s edit history. In: Proceedings of the ACL-HLT 2011 system demonstrations, association for computational linguistics, pp 97–102

  • Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th intl. Joint conf. on artificial intelligence (IJCAI), vol 7, pp 1606–1611

  • Gabrilovich E, Markovitch S (2009) Wikipedia-based semantic interpretation for natural language processing. J Artif Intell Res 34:443–498

    Article  Google Scholar 

  • Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM Sigmod record, vol 29. ACM, pp 1–12

  • Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam

    MATH  Google Scholar 

  • He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans on Knowledge and Data Engineering

  • Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of 35th international conference on software engineering, pp 392–401

  • Huang L, Ng V, Persing I, Geng R, Bai X, Tian J (2011) AutoODC: Automated generation of orthogonal defect classifications. In: Proceedings of 26th IEEE/ACM international conference on automated software engineering (ASE)

  • Huang L, Ng V, Persing I, Chen M, Li Z, Geng R, Tian J (2015) AutoODC: Automated generation of orthogonal defect classifications. Automated Software Engineering Journal 22(1):3–46

    Article  Google Scholar 

  • IBM (2013a) Orthogonal defect classification version 5.2 extensions for defects in GUI, user documentation, build and national language support (NLS). https://researcher.watson.ibm.com/researcher/files/us-pasanth/ODC-5-2-Extensions.pdf, (URL accessibility verified on 9th Nov., 2018)

  • IBM (2013b) Orthogonal defect classification version 5.2 for software design and code. http://researcher.watson.ibm.com/researcher/files/us-pasanth/ODC-5-2.pdf, (URL accessibility verified on 9th Nov., 2018)

  • IEEE (2009) IEEE standard 1044-2009 classification for software anomalies

  • Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intelligent Data Analysis 6(5):429–449

    Article  Google Scholar 

  • Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 1st edn. Prentice Hall PTR, Upper Saddle River

    Google Scholar 

  • Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York

    Book  Google Scholar 

  • Mellegård N, Staron M, Törner F (2012) A light-weight defect classification scheme for embedded automotive software and its initial evaluation. In: Proceedings of IEEE 23rd International Symp. on Software Reliability Engineering (ISSRE), pp 261–270

  • Menzies T, Marcus A (2008) Automated severity assessment of software defect reports. In: IEEE international conference on software maintenance (ICSM), pp 346–355

  • Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In: Proceedings of the 2013 international conference on software engineering, ICSE ’13, pp 522–531

  • Patil S (2017) Concept based classification of software defect reports. In: Proceedings of 14th international conference on mining software repositories (MSR), IEEE/ACM

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  • Robertson S, Zaragoza H, et al (2009) The probabilistic relevance framework: BM25 and beyond. Foundations and Trends®; in Information Retrieval 3(4):333–389

    Article  Google Scholar 

  • Robertson S E, Walker S, Jones S, Hancock-Beaulieu MM, Gatford M, et al (1995) Okapi at TREC-3. NIST Special Publication Sp 109:109

  • Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: Proceedings of 29th international conference on software engineering. IEEE Computer Society, pp 499–510

  • Salton G, McGill M J (1986) Introduction to modern information retrieval. McGraw-Hill Inc, New York

    MATH  Google Scholar 

  • Silva N, Vieira M (2014) Experience report: orthogonal classification of safety critical issues. In: 2014 IEEE 25th international symposium on software reliability engineering. IEEE, pp 156–166

  • Student (1908) The probable error of a mean. Biometrika 6(1):1–25. https://doi.org/10.1093/biomet/6.1.1

    Article  MATH  Google Scholar 

  • Thung F, Lo D, Jiang L (2012) Automatic defect categorization. In: Proceedings of 19th working conference on reverse engineering (WCRE). IEEE, pp 205–214

  • Thung F, Le X-BD, Lo D (2015) Active semi-supervised defect categorization. In: Proceedings of IEEE 23rd international conference on program comprehension (ICPC), pp 60–70

  • Vallespir D, Grazioli F, Herbert J (2009) A framework to evaluate defect taxonomies. In: Proceedings of XV Congreso Argentino de Ciencias de La Computación

  • Wagner S (2008) Defect classification and defect types revisited. In: Proceedings of workshop on defects in large software systems. ACM, pp 39–40

  • Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on software engineering, pp 461–470

  • Xia X, Lo D, Wang X, Zhou B (2014) Automatic defect categorization based on fault triggering conditions. In: Proceedings of 19th international conference on engineering of complex computer systems (ICECCS). IEEE, pp 39–48

  • Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans on pattern analysis and machine intelligence. https://doi.org/10.1109/TPAMI.2018.2857768

    Article  Google Scholar 

  • Yang YY, Lee SC, Chung YA, Wu TE, Chen SA, Lin HT (2017) libact: Pool-based active learning in python. Tech. rep., National Taiwan University. https://github.com/ntucllab/libact, available as arXiv:1710.00379

  • Zaki MJ, Meira W Jr (2014) Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Zesch T, Müller C, Gurevych I (2008) Extracting lexical semantic knowledge from wikipedia and wiktionary. In: Proceedings of 6th International conference on language resources and evaluation (LREC), vol 8, pp 1646–1652

  • Zhou Y, Tong Y, Gu R, Gall H (2016) Combining text mining and data mining for bug report classification. Journal of Software: Evolution and Process 28(3)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sangameshwar Patil.

Additional information

Communicated by: Tim Menzies

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary, work-in-progress version of this work was presented as a short paper – “Concept based Classification of Software Defect Reports”, Sangameshwar Patil, Mining Software Repositories (MSR), 2017. This article is a significantly extended version of the short paper with new results and analysis.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(XLSX 7.68 KB)

(XLSX 54.7 KB)

Appendices

Appendix A: IEEE 1044-2009 Standard based Software Defect Type Classification Scheme

Table 8 The software defect type families based on the sample defect type classification scheme in Table A.1 (Annexure A) of IEEE 1044-2009 Standard (IEEE 2009)

Appendix B: Additional Figures for Experimental Results of RQ2

In this section, we provide the additional figures summarizing the experimental results of the RQ2 to analyze the effect of change in number of concepts (N) used on the coverage and accuracy of the concept-based classification (CBC) approach. The analysis of these results is already discussed in the Section 4.3.2.

1.1 B.1: RQ2 Results for Roundcube Dataset and IEEE-Based Classification Scheme

Fig. 6
figure 6

Roundcube dataset and IEEE-based classification scheme: Effect of varying the number of concepts (N) in the concept-based representation on coverage

1.2 B.2: RQ2 Results for Roundcube Dataset and ODC-Based Classification Scheme

Fig. 7
figure 7

Roundcube dataset and ODC-based classification scheme: Effect of varying the number of concepts (N) in the concept-based representation on coverage

1.3 B.3: RQ2 Results for Apache-Libs Dataset and IEEE-Based Classification Scheme

Fig. 8
figure 8

Apache-Libs dataset and IEEE-based classification scheme: Effect of varying the number of concepts (N) in the concept-based representation on coverage

Appendix C: Dataset Annotation Details

The annotations for the Apache-Libs dataset by Thung et al. (2012) were done before the IBM ODC version 5.2 and its extensions (IBM2013a, b) were made available (12th Sept. 2013). This new version of IBM ODC v5.2 extensions (IBM 2013a) introduces additional defect types. It includes a new National Language Support (NLS) type of defect (i.e., “Problems encountered in the implementation of the product functions in languages other than English”). These changes in the ODC scheme could not have been considered by Thung et al. (2012). To account for the changes in the defect type families due to the IBM ODC v5.2 extensions (IBM 2013a) as well as to improve the robustness of this dataset as a benchmark, we re-annotated the dataset. The annotations were done by a software professional with multi-year experience in software design, development, testing, and debugging experience.

Out of the 500 defect type annotations in this dataset, there are 472 annotations which matched with the original annotations by Thung et al. (2012) and there are 28 annotation disagreements. There are 94.4% matching annotations with Thung et al. (2012) and the inter-annotator agreement with their original annotations using Cohen’s kappa statistic (Cohen 1960) is 90.02%. Note that this is a very high-level of inter-annotator agreement. The 28 annotations which differed with the original annotations were further reviewed and verified by another software professional with more than a decade’s hands-on experience in software development life-cycle. This review led to change in annotations of 2 defect reports (out of the 28 defect reports with differing annotations). These two annotations were analyzed in the discussions between the two annotators and the corrections were approved.

We make the annotated dataset available for research prupose as Supplementary Material along with the paper as well as on email request. The high-level of inter-annotator agreement (the 94.4% matching annotations and Cohen’s κ = 90.02%) as well as the explanatory comments for the few differing annotations make this dataset a high-quality benchmark for software defect type classification task. Table 5 shows the dataset statistics and the label distribution in the ground truth annotations. For other combinations of datasets and classification schemes used in this paper, the annotation process was similar. Details of inter-annotator agreement for annotations of other combinations of datasets and classification scheme are mentioned in Section 4.1.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Patil, S., Ravindran, B. Predicting software defect type using concept-based classification. Empir Software Eng 25, 1341–1378 (2020). https://doi.org/10.1007/s10664-019-09779-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-019-09779-6

Keywords

Navigation