Code smell detection using multi-label classification approach

Guggulothu, Thirupathi; Moiz, Salman Abdul

doi:10.1007/s11219-020-09498-y

Code smell detection using multi-label classification approach

Published: 04 April 2020

Volume 28, pages 1063–1086, (2020)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

1489 Accesses
31 Citations
Explore all metrics

Abstract

Code smells are characteristics of the software that indicates a code or design problem which can make software hard to understand, evolve, and maintain. There are several code smell detection tools proposed in the literature, but they produce different results. This is because smells are informally defined or subjective in nature. Machine learning techniques help in addressing the issues of subjectivity, which can learn and distinguish the characteristics of smelly and non-smelly source code elements (classes or methods). However, the existing machine learning techniques can only detect a single type of smell in the code element that does not correspond to a real-world scenario as a single element can have multiple design problems (smells). Further, the mechanisms proposed in the literature could not detect code smells by considering the correlation (co-occurrence) among them. To address these shortcomings, we propose and investigate the use of multi-label classification (MLC) methods to detect whether the given code element is affected by multiple smells or not. In this proposal, two code smell datasets available in the literature are converted into a multi-label dataset (MLD). In the MLD, we found that there is a positive correlation between the two smells (long method and feature envy). In the classification phase, the two methods of MLC considered the correlation among the smells and enhanced the performance (on average more than 95% accuracy) for the 10-fold cross-validation with the ten iterations. The findings reported help the researchers and developers in prioritizing the critical code elements for refactoring based on the number of code smells detected.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Categorical Analysis of Code Smell Detection Using Machine Learning Algorithms

Severity Classification of Code Smells Using Machine-Learning Methods

Article 29 July 2023

Seema Dewangan, Rajwant Singh Rao, … Manjari Gupta

Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs

References

Abdelmoez, W, Kosba, E, Iesa, AF. (2014). Risk-based code smells detection tool. In The international conference on computing technology and information management (ICCTIM2014) (pp. 148–159): The Society of Digital Information and Wireless Communication.
Amorim, L, Costa, E, Antunes, N, Fonseca, B, Ribeiro, M. (2015). Experience report: evaluating the effectiveness of decision trees for detecting code smells. In 2015 IEEE 26th international symposium on software reliability engineering (ISSRE) (pp. 261–269): IEEE.
Azeem, M.I., Palomba, F., Shi, L., Wang, Q. (2019). Machine learning techniques for code smell detectio: a systematic literature review and meta-analysis. Information and Software Technology.
Booch, G. (1980). Object-oriented analysis and design. Addison-Wesley.
Boutell, M.R., Luo, J., Shen, X., Brown, C.M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757–1771.
Article Google Scholar
Bowes, D, Randall, D, Hall, T. (2013). The inconsistent measurement of message chains. In 2013 4th International workshop on emerging trends in software metrics (WETSoM) (pp. 62–68): IEEE.
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F. (2015). Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing, 163, 3–16.
Article Google Scholar
Ciupke, O. (1999). Automatic detection of design problems in object-oriented reengineering. In Technology of object-oriented languages and systems, 1999. TOOLS 30 Proceedings (pp. 18–32): IEEE.
Di Nucci, D., Palomba, F., Tamburri, D.A., Serebrenik, A., De Lucia, A. (2018). Detecting code smells using machine learning techniques: are we there yet?. In 2018 IEEE 25th International conference on software analysis, evolution and reengineering SANER (pp. 612–621): IEEE.
Ferme, V. (2013). Jcodeodor: a software quality advisor through design flaws detection. Master’s thesis University of Milano-Bicocca, Milano, Italy.
Fontana, F.A., & Zanoni, M. (2017). Code smell severity classification using machine learning techniques. Knowledge-Based Systems, 128, 43–58.
Article Google Scholar
Fontana, F.A., Braione, P., Zanoni, M. (2012). Automatic detection of bad smells in code: an experimental assessment. Journal of Object Technology, 11(2), 5–1.
Google Scholar
Fontana, F.A., Dietrich, J., Walter, B., Yamashita, A., Zanoni, M. (2016a). Antipattern and code smell false positives: preliminary conceptualization and classification. In 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), (Vol. 1 pp. 609–613): IEEE.
Fontana, F.A., Mäntylä, M.V., Zanoni, M., Marino, A. (2016b). Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering, 21(3), 1143–1191.
Fowler, M., Beck, K., Brant, J., Opdyke, W., Roberts, D. (1999). Refactoring: improving the design of existing programs.
Godbole, S, & Sarawagi, S. (2004). Discriminative methods for multi-labeled classification. In Pacific-Asia conference on knowledge discovery and data mining (pp. 22–30): Springer.
Guo, Y., & Gu, S. (2011). Multi-label classification using conditional dependency networks. In IJCAI Proceedings-international joint conference on artificial intelligence, (Vol. 22 p. 1300).
Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S. (2011). Developing fault-prediction models: what the research can show industry. IEEE Software, 28(6), 96–99.
Article Google Scholar
Kessentini, W., Kessentini, M., Sahraoui, H., Bechikh, S., Ouni, A. (2014). A cooperative parallel search-based software engineering approach for code-smells detection. IEEE Transactions on Software Engineering, 40(9), 841–861.
Article Google Scholar
Khomh, F, Vaucher, S, Guéhéneuc, YG, Sahraoui, H. (2009). A Bayesian approach for the detection of code and design smells. In 9th International conference on quality software, 2009. QSIC’09 (pp. 305–314): IEEE.
Khomh, F., Vaucher, S., Guéhéneuc, Y.G, Sahraoui, H. (2011). Bdtex: a gqm-based Bayesian approach for the detection of antipatterns. Journal of Systems and Software, 84(4), 559–572.
Article Google Scholar
Kreimer, J. (2005). Adaptive detection of design flaws. Electronic Notes in Theoretical Computer Science, 141(4), 117–136.
Article Google Scholar
Liu, H., Guo, X., Shao, W. (2013). Monitor-based instant software refactoring. IEEE Transactions on Software Engineering, 1.
Maiga, A, Ali, N, Bhattacharya, N, Sabané, A, Guéhéneuc, YG, Antoniol, G, Aïmeur, E. (2012). Support vector machines for anti-pattern detection. In 2012 Proceedings of the 27th IEEE/ACM international conference on automated software engineering (ASE) (pp. 278–281): IEEE.
Maneerat, N., & Muenchaisri, P. (2011). Bad-smell prediction from software design model using machine learning techniques. In 2011 Eighth international joint conference on computer science and software engineering (JCSSE) (pp. 331–336): IEEE.
Marinescu, R. (2002). Measurement and quality in objectoriented design. IEEE International Conference on Software Maintenance.
Marinescu, R. (2004). Detection strategies: metrics-based rules for detecting design flaws. In 20th IEEE International conference on software maintenance, 2004. Proceedings (pp. 350–359): IEEE.
Marinescu, R. (2005). Measurement and quality in object-oriented design. In Proceedings of the 21st IEEE international conference on software maintenance, 2005. ICSM’05 (pp. 701–704): IEEE.
Moha, N., Gueheneuc, Y.G., Duchien, A.F., et al. (2010a). Decor: a method for the specification and detection of code and design smells. IEEE Transactions on Software Engineering (TSE), 36(1), 20–36.
Moha, N., Guéhéneuc, Y.G., Le Meur, A.F., Duchien, L., Tiberghien, A. (2010b). From a domain analysis to the specification and detection of code and design smells. Formal Aspects of Computing, 22(3-4), 345–361.
Murphy-Hill, E, & Black, AP. (2010). An interactive ambient visualization for code smells. In Proceedings of the 5th international symposium on software visualization (pp. 5–14): ACM.
Nongpong, K. (2012). Integrating “code smells” detection with refactoring tool support. Thesis, University of Wisconsin-Milwaukee.
Opdyke, W.F. (1992). Refactoring: a program restructuring aid in designing object-oriented application frameworks PhD thesis. PhD thesis: University of Illinois at Urbana-Champaign.
Google Scholar
Palomba, F, Bavota, G, Di Penta, M, Oliveto, R, De Lucia, A, Poshyvanyk, D. (2013). Detecting bad smells in source code using change history information. In Proceedings of the 28th IEEE/ACM international conference on automated software engineering (pp. 268–278): IEEE Press.
Palomba, F., Bavota, G., Di Penta, M., Oliveto, R., Poshyvanyk, D., De Lucia, A. (2015). Mining version histories for detecting code smells. IEEE Transactions on Software Engineering, 41(5), 462–489.
Article Google Scholar
Palomba, F, Oliveto, R, De Lucia, A. (2017). Investigating code smell co-occurrences using association rule learning: a replicated study. In IEEE Workshop on machine learning techniques for software quality evaluation (MaLTeSQuE) (pp. 8–13): IEEE.
Palomba, F., Bavota, G., Di Penta, M., Fasano, F., Oliveto, R., De Lucia, A. (2018). On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation. Empirical Software Engineering, 23(3), 1188–1221.
Article Google Scholar
Pecorelli, F, Di Nucci, D, De Roover, C, De Lucia, A. (2019a). On the role of data balancing for machine learning-based code smell detection. In Proceedings of the 3rd ACM SIGSOFT international workshop on machine learning techniques for software quality evaluation (pp. 19–24): ACM.
Pecorelli, F, Palomba, F, Di Nucci, D, De Lucia, A. (2019b). Comparing heuristic and machine learning approaches for metric-based code smell detection. In Proceedings of the 27th international conference on program comprehension (pp. 93–104): IEEE Press.
Rao, A.A., & Reddy, K.N. (2007). Detecting bad smells in object oriented design using design change propagation probability matrix 1.
Rasool, G., & Arshad, Z. (2015). A review of code smell mining techniques. Journal of Software: Evolution and Process, 27(11), 867–895.
Google Scholar
Read, J, Pfahringer, B, Holmes, G. (2008). Multi-label classification using ensembles of pruned sets. In 2008 Eighth IEEE international conference on data mining (pp. 995–1000): IEEE.
Read, J., Pfahringer, B., Holmes, G., Frank, E. (2011). Classifier chains for multi-label classification. Machine Learning, 85(3), 333.
Article MathSciNet Google Scholar
Read, J., Reutemann, P., Pfahringer, B., Holmes, G. (2016). Meka: a multi-label/multi-target extension to weka. The Journal of Machine Learning Research, 17(1), 667–671.
MathSciNet MATH Google Scholar
Sheikh, L.M., Tanveer, B., Hamdani, M. (2004). Interesting measures for mining association rules. In 8th International multitopic conference, 2004. Proceedings of INMIC 2004 (pp. 641–644): IEEE.
Sorower, M.S. (2010). A literature survey on algorithms for multi-label learning. Oregon State University, Corvallis, p. 18.
Tempero, E, Anslow, C, Dietrich, J, Han, T, Li, J, Lumpe, M, Melton, H, Noble, J. (2010). The qualitas corpus: a curated collection of java code for empirical studies. In Software engineering conference (APSEC), 2010 17th Asia Pacific (pp. 336–345): IEEE.
Travassos, G., Shull, F., Fredericks, M., Basili, V.R. (1999). Detecting defects in object-oriented designs: using reading techniques to increase software quality. In ACM sigplan notices, (Vol. 34 pp. 47–56): ACM.
Tsantalis, N., & Chatzigeorgiou, A. (2009). Identification of move method refactoring opportunities. IEEE Transactions on Software Engineering, 35(3), 347–367.
Article Google Scholar
Tsoumakas, G., & Katakis, I. (2007). Multi-label classification: an overview. International Journal of Data Warehousing and Mining (IJDWM), 3(3), 1–13.
Article Google Scholar
Tsoumakas, G., Katakis, I., Vlahavas, I. (2011). Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering, 23 (7), 1079–1089.
Article Google Scholar
Tufano, M., Palomba, F., Bavota, G., Oliveto, R., Di Penta, M., De Lucia, A., Poshyvanyk, D. (2017). When and why your code starts to smell bad (and whether the smells go away). IEEE Transactions on Software Engineering, 43(11), 1063–1088.
Article Google Scholar
Wang, X, Dang, Y, Zhang, L, Zhang, D, Lan, E, Mei, H. (2012). Can i clone this piece of code here?. In Proceedings of the 27th IEEE/ACM international conference on automated software engineering (pp. 170–179): ACM.
White, M, Tufano, M, Vendome, C, Poshyvanyk, D. (2016). Deep learning code fragments for code clone detection. In Proceedings of the 31st IEEE/ACM international conference on automated software engineering (pp. 87–98): ACM.
Yang, J., Hotta, K., Higo, Y., Igaki, H., Kusumoto, S. (2015). Classification model for code clones based on machine learning. Empirical Software Engineering, 20 (4), 1095–1125.
Article Google Scholar
Zaidi, MA, & Colomo-Palacios, R. (2019). Code smells enabled by artificial intelligence: a systematic mapping. In International conference on computational science and its applications (pp. 418–427): Springer.
Zhang, M.-L., & Zhou, Z.-H. (2013). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Information Sciences, University of Hyderabad, Hyderabad, Telangana, India
Thirupathi Guggulothu & Salman Abdul Moiz

Authors

Thirupathi Guggulothu
View author publications
You can also search for this author in PubMed Google Scholar
Salman Abdul Moiz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thirupathi Guggulothu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection on Quality Management for Information Systems

Guest Editors: Mario Piattini, Ignacio García Rodríguez de Guzmán, Ricardo Pérez del Castillo

Appendix

Table 13 Results of BR method using top 5 single label classifers

Full size table

Table 14 Results of CC method using top 5 single label classifers

Full size table

Table 15 Results of LC method using top 5 single label classifers

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guggulothu, T., Moiz, S.A. Code smell detection using multi-label classification approach. Software Qual J 28, 1063–1086 (2020). https://doi.org/10.1007/s11219-020-09498-y

Download citation

Published: 04 April 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11219-020-09498-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Code smell detection using multi-label classification approach

Abstract

Access this article

Similar content being viewed by others

Categorical Analysis of Code Smell Detection Using Machine Learning Algorithms

Severity Classification of Code Smells Using Machine-Learning Methods

Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Categorical Analysis of Code Smell Detection Using Machine Learning Algorithms

Severity Classification of Code Smells Using Machine-Learning Methods

Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation