Skip to main content
Log in

Traceability recovery between bug reports and test cases-a Mozilla Firefox case study

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Automatic recovery of traceability between software artifacts may promote early detection of issues and better calculate change impact. Information Retrieval (IR) techniques have been proposed for the task, but they differ considerably in input parameters and results. It is difficult to assess results when those techniques are applied in isolation, usually in small or medium-sized software projects. Recently, multilayered approaches to machine learning, in special Deep Learning (DL), have achieved success in text classification through their capacity to model complex relationships among data. In this article, we apply several IR and DL techniques for investing automatic traceability between bug reports and manual test cases, using historical data from the Mozilla Firefox’s Quality Assurance (QA) team. In this case study, we assess the following IR techniques: LSI, LDA, and BM25, in addition to a DL architecture called Convolutional Neural Networks (CNNs), through the use of Word Embeddings. In this context of traceability, we observe poor performances from three out of the four studied techniques. Only the LSI technique presented acceptable results, standing out even over the state-of-the-art BM25 technique. The obtained results suggest that the semi-automatic application of the LSI technique – with an appropriate combination of thresholds – may be feasible for real-world software projects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25

Similar content being viewed by others

Data availability

All the data is available in the following repository: https://github.com/guilhermemg/trace-links-tc-br.

Notes

  1. https://www.mozilla.org/

  2. https://bugzilla.mozilla.org

  3. https://wiki.mozilla.org/Platform/GFX/APZ

  4. Given an observable variable X and a target variable Y, a generative model is a statistical model of the joint probability distribution on \(X \times Y\), P(X, Y) (Y. Ng and Jordan 2002)

  5. https://github.com/svn2github/word2vec

  6. https://commoncrawl.org/

  7. https://www.mozilla.org

  8. https://public.etherpad-mozilla.org/

  9. http://bugzilla.mozilla.org

  10. Bug Fields: https://bugs.documentfoundation.org/page.cgi?id=fields.html

  11. PyBossa Platform: https://pybossa.com/

  12. https://support.mozilla.orghttps://wiki.mozilla.org/QA/https://www.paessler.com/manualshttps://addons.mozilla.orghttps://developer.mozilla.org

  13. NLTK: https://www.nltk.org

  14. SciKit: https://scikit-learn.org/stable/

  15. Gensim: https://radimrehurek.com/gensim/

  16. SpaCy: https://spacy.io/

  17. https://github.com/guilhermemg/trace-links-tc-br

  18. https://nlp.stanford.edu/projects/glove/

  19. https://spacy.io/models/en

References

  • Antoniol, G., Canfora, G., Casazza, G., De Lucia, A., Merlo, E.: Recovering traceability links between code and documentation. IEEE Trans. Softw. Eng. 28(10), 970–983 (2002). https://doi.org/10.1109/TSE.2002.1041053

    Article  Google Scholar 

  • Berry, D.M.: Evaluation of tools for hairy requirements and software engineering tasks. In: Proceedings - 2017 IEEE 25th International Requirements Engineering Conference Workshops. REW 2017, 284–291 (2017). https://doi.org/10.1109/REW.2017.25

  • Bjarnason, E., Unterkalmsteiner, M., Borg, M., Engström, E.: A multi-case study of agile requirements engineering and the use of test cases as requirements. Inf. Softw. Technol. (2016). https://doi.org/10.1016/j.infsof.2016.03.008

    Article  Google Scholar 

  • Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://doi.org/10.1111/j.1365-2966.2012.21196.x. arXiv:1111.6189

    Article  MATH  Google Scholar 

  • Borg, M., Runeson, P., Ardö, A.: Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empir. Softw. Eng. 19(6), 1565–1616 (2014). https://doi.org/10.1007/s10664-013-9255-y

    Article  Google Scholar 

  • Buttcher, S., Clarke, C.L.A., Cormack, G.V.: Information retrieval-implementing and evaluating search engines. MIT Press, Cambridge (2010)

    MATH  Google Scholar 

  • Canfora, G., Cerulo, L.: Fine grained indexing of software repositories to support impact analysis. Adv. Mater. Res. (2006). https://doi.org/10.4028/www.scientific.net/AMR.785-786.1516

  • Capobianco, G., De Lucia, A., Oliveto, R., Panichella, A., Panichella, S.: On the role of the nouns in IR-based traceability recovery. In: IEEE International Conference on Program Comprehension pp 148–157, (2009a) https://doi.org/10.1109/ICPC.2009.5090038

  • Capobianco, G., De Lucia, A., Oliveto, R., Panichella, A., Panichella, S.: Traceability recovery using numerical analysis. In: Proceedings - Working Conference on Reverse Engineering, WCRE pp 195–204, (2009b) https://doi.org/10.1109/WCRE.2009.14

  • Davies, S., Roper, M.: What’s in a bug report?. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement - ESEM ’14 pp 1–10, (2014) https://doi.org/10.1145/2652524.2652541

  • De Lucia, A., Fasano, F., Oliveto, R., Tortora, G.: Can information retrieval techniques effectively support traceability link recovery?. In: IEEE International Conference on Program Comprehension 2006, 307–316 (2006). https://doi.org/10.1109/ICPC.2006.15

  • De Lucia, A., Oliveto, R., Tortora, G.: Assessing IR-based traceability recovery tools through controlled experiments. Empir. Softw. Eng. 14(1), 57–92 (2009). https://doi.org/10.1007/s10664-008-9090-8

    Article  Google Scholar 

  • Deerwester, S., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990). https://doi.org/10.1017/CBO9781107415324.004

    Article  Google Scholar 

  • Dekhtyar, A., Fong, V.: RE Data Challenge: Requirements Identification with Word2Vec and TensorFlow. In: Proceedings - 2017 IEEE 25th International Requirements Engineering Conference. RE 2017, 484–489 (2017). https://doi.org/10.1109/RE.2017.26

  • Dekhtyar, A., Hayes, J.H., Sundaram, S., Holbrook, A., Dekhtyar, O.: Technique integration for requirements assessment. In: Proceedings - 15th IEEE International Requirements Engineering Conference. RE 2007, 141–152 (2007). https://doi.org/10.1109/RE.2007.60

  • Eder, S., Hauptmann, B., Junker, M., Vaas, R., Prommer, K.H.: Selecting manual regression test cases automatically using trace link recovery and change coverage. In: Proceedings of the 9th International Workshop on Automation of Software Test, Association for Computing Machinery, New York, NY, USA, AST 2014, p. 29–35 (2014)

  • Falessi, D., Cantone, G., Canfora, G.: A comprehensive characterization of NLP techniques for identifying equivalent requirements. In: Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement - ESEM ’10 p 1, (2010) https://doi.org/10.1145/1852786.1852810

  • Falessi, D., Di Penta, M., Canfora, G., Cantone, G.: Estimating the number of remaining links in traceability recovery. Empir. Softw. Eng. 22(3), 996–1027 (2017). https://doi.org/10.1007/s10664-016-9460-6

    Article  Google Scholar 

  • Fazzini, M., Prammer, M., D’Amorim, M., Orso, A.: Automatically translating bug reports into test cases for mobile apps. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis - ISSTA 2018 pp 141–152, (2018) https://doi.org/10.1145/3213846.3213869

  • Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. (2016) https://doi.org/10.1016/B978-0-12-801775-3.00001-9, arXiv:1011.1669v3

  • Gotel, O,C.Z., Finkelstein, A.C.W.: An Analysis of the Requirements Traceability Problem. In: 1st International Conference on Requirements Engineering (RE 1994) pp 94–101, (1994) https://doi.org/10.1109/ICRE.1994.292398

  • Guo, J., Cheng, J., Cleland-Huang, J.: Semantically Enhanced Software Traceability Using Deep Learning Techniques. In: Proceedings - 2017 IEEE/ACM 39th International Conference on Software Engineering, ICSE 2017 pp 3–14, (2017) https://doi.org/10.1109/ICSE.2017.9, arXiv:1804.02438

  • Hayes, J.H., Dekhtyar, A., Sundaram, S.K.: Tracing and mapping : supporting software quality predictions (2005)

  • Hayes, J.H., Dekhtyar, A., Sundaram, S.K.: Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans. Softw. Eng. 32(1), 4–19 (2006). https://doi.org/10.1109/TSE.2006.3

    Article  Google Scholar 

  • Hayes, J.H., Dekhtyar, A., Sundaram, S.K., Holbrook, E.A., Vadlamudi, S., April, A.: Requirements tracing on target (RETRO): improving software maintenance through traceability recovery. Innov. Syst. Softw. Eng. 3(3), 193–202 (2007). https://doi.org/10.1007/s11334-007-0024-1

    Article  Google Scholar 

  • Hemmati, H., Sharifi, F.: Investigating NLP-Based Approaches for Predicting Manual Test Case Failure. In: Proceedings - 2018 IEEE 11th International Conference on Software Testing, Verification and Validation, ICST 2018 pp 309–319, (2018) https://doi.org/10.1109/ICST.2018.00038

  • Hoffman, M.D., Bach, F.R., Blei, D.M., Bach, F.R.: Online Learning for Latent Dirichlet Allocation. AcademiaEdu pp 1–5 (2012)

  • Kaushik, N., Tahvildari, L., Moore, M.: Reconstructing traceability between bugs and test cases: an experimental study. In: Proceedings - Working Conference on Reverse Engineering, WCRE pp 411–414, (2011) https://doi.org/10.1109/WCRE.2011.58

  • Kun, Chen, Wei, Zhang, Haiyan, Zhao, Hong, Mei: An approach to constructing feature models based on requirements clustering pp 31–40, (2005) https://doi.org/10.1109/re.2005.9

  • Lee, D.: How to write a bug report that will make your engineers love you. (2016) Retrieved May 30, 2019 from https://testlio.com/blog/the-ideal-bug-report

  • Lormans, M., Van Deursen, A.: Can LSI help reconstructing requirements traceability in design and test?. In: Proceedings of the European Conference on Software Maintenance and Reengineering, CSMR pp. 47–56, (2006) https://doi.org/10.1109/CSMR.2006.13

  • Lucia, A.D., Penta, M.D., Oliveto, R., Panichella, A., Panichella, S.: Improving IR-based traceability recovery using smoothing filters. In: IEEE International Conference on Program Comprehension pp. 21–30, (2011) https://doi.org/10.1109/ICPC.2011.34

  • Lucia, A.D., Di, M., Oliveto, R., Panichella, A., Panichella, S.: Applying a smoothing filter to improve IR-based traceability recovery processes?: an empirical investigation q. Inf. Softw. Technol. 55, 741–754 (2013)

    Article  Google Scholar 

  • Manning, C.D., Raghavan, P., Schütze, H.: An introduction to information retrieval. Cambridge University Press, Cambridge (2009)

    MATH  Google Scholar 

  • Mäntylä, M.V., Khomh, F., Adams, B., Engström, E., Petersen, K.: On rapid releases and software testing. Presented at the (2013). https://doi.org/10.1109/ICSM.2013.13

  • Merten, T., Krämer, D., Mager, B., Schell, P., Bürsner, S., Paech, B.: Do Information Retrieval Algorithms for Automated Traceability Perform Effectively on Issue Tracking System Data? Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9619, 45–62 (2016). https://doi.org/10.1007/978-3-319-30282-9_4

  • Mikolov, T., Chen, K., Corrado, G., Dean, J. Efficient Estimation of Word Representations in Vector Space (2013) arXiv:1301.3781

  • Mills, C.: Automating traceability link recovery through classification. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2017, ACM Press, New York, New York, USA, pp. 1068–1070, (2017) https://doi.org/10.1145/3106237.3121280

  • Minelli, R., Lanza, M.: Software analytics for mobile applications–insights lessons learned. In: 17th European Conference on Software Maintenance and Reengineering, pp. 144–153 (2013)

  • Oliveto, R., Gethers, M., Poshyvanyk, D., Lucia, A.D., De Lucia, A.: On the equivalence of information retrieval methods for automated traceability link recovery.In: IEEE International Conference on Program Comprehension pp 68–71, (2010) https://doi.org/10.1109/ICPC.2010.20

  • Panichella, A., Dit, B., Oliveto, R., Di Penta, M., Poshynanyk, D., De Lucia, A.: How to effectively use topic models for software engineering tasks? An approach based on Genetic Algorithms. In: Proceedings - International Conference on Software Engineering pp. 522–531, (2013) https://doi.org/10.1109/ICSE.2013.6606598

  • Passos, L., Czarnecki, K., Apel, S., Wa̧sowski, A., Kästner, C., Guo, J.: Feature-oriented software evolution p 1, (2013) https://doi.org/10.1145/2430502.2430526

  • Pennington, J., Socher, R., Manning, C.: Glove: Global Vectors for Word Representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 1532–1543, (2014) https://doi.org/10.3115/v1/D14-1162, arXiv:1504.06654

  • Robertson, S., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond, vol 3. (2009) https://doi.org/10.1561/1500000019

  • Sabev, P., Grigorova, K.: Manual to automated testing: An effort-based approach for determining the priority of software test automation (2015)

  • Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975). https://doi.org/10.1145/361219.361220

    Article  MATH  Google Scholar 

  • Sommerville, I.: Software engineering, 9th edn. Addison-Wesley, Boston (2010). https://doi.org/10.1111/j.1365-2362.2005.01463.x

    Book  MATH  Google Scholar 

  • Ng, A.Y., Jordan, M.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. Adv. Neural Inf. Process. Sys. 2, 841–848 (2002)

    Google Scholar 

  • Yadla, S., Hayes, J.H., Dekhtyar, A.: Tracing requirements to defect reports: an application of information retrieval techniques. Innov. Syst. Softw. Eng. 1(2), 116–124 (2005). https://doi.org/10.1007/s11334-005-0011-3

    Article  Google Scholar 

  • Zimmermann, T., Premraj, R., Bettenburg, N., Just, S., Schroter, A., Weiss, C., Schröter, A., Weiss, C.: What makes a good bug report? IEEE Trans. Softw. Eng. 36(5), 618–643 (2010). https://doi.org/10.1109/TSE.2010.63

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank the Brazilian agency CAPES for partially funding this research. We also would like to thank all the volunteers that kindly participated of our study. Last but not least, thanks for the anonymous reviewers whose contributions were decisive for the improving preliminary versions of the article.

Funding

Brazilian federal agency CAPES of Ministry of Education (MEC/Brazil).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guilherme Gadelha.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Code availability

All the code is available in the same repository as the data and material: https://github.com/guilhermemg/trace-links-tc-br.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gadelha, G., Ramalho, F. & Massoni, T. Traceability recovery between bug reports and test cases-a Mozilla Firefox case study. Autom Softw Eng 28, 8 (2021). https://doi.org/10.1007/s10515-021-00287-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10515-021-00287-w

Keywords

Navigation