Skip to main content
Log in

Code smell detection using multi-label classification approach

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Code smells are characteristics of the software that indicates a code or design problem which can make software hard to understand, evolve, and maintain. There are several code smell detection tools proposed in the literature, but they produce different results. This is because smells are informally defined or subjective in nature. Machine learning techniques help in addressing the issues of subjectivity, which can learn and distinguish the characteristics of smelly and non-smelly source code elements (classes or methods). However, the existing machine learning techniques can only detect a single type of smell in the code element that does not correspond to a real-world scenario as a single element can have multiple design problems (smells). Further, the mechanisms proposed in the literature could not detect code smells by considering the correlation (co-occurrence) among them. To address these shortcomings, we propose and investigate the use of multi-label classification (MLC) methods to detect whether the given code element is affected by multiple smells or not. In this proposal, two code smell datasets available in the literature are converted into a multi-label dataset (MLD). In the MLD, we found that there is a positive correlation between the two smells (long method and feature envy). In the classification phase, the two methods of MLC considered the correlation among the smells and enhanced the performance (on average more than 95% accuracy) for the 10-fold cross-validation with the ten iterations. The findings reported help the researchers and developers in prioritizing the critical code elements for refactoring based on the number of code smells detected.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Abdelmoez, W, Kosba, E, Iesa, AF. (2014). Risk-based code smells detection tool. In The international conference on computing technology and information management (ICCTIM2014) (pp. 148–159): The Society of Digital Information and Wireless Communication.

  • Amorim, L, Costa, E, Antunes, N, Fonseca, B, Ribeiro, M. (2015). Experience report: evaluating the effectiveness of decision trees for detecting code smells. In 2015 IEEE 26th international symposium on software reliability engineering (ISSRE) (pp. 261–269): IEEE.

  • Azeem, M.I., Palomba, F., Shi, L., Wang, Q. (2019). Machine learning techniques for code smell detectio: a systematic literature review and meta-analysis. Information and Software Technology.

  • Booch, G. (1980). Object-oriented analysis and design. Addison-Wesley.

  • Boutell, M.R., Luo, J., Shen, X., Brown, C.M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757–1771.

    Article  Google Scholar 

  • Bowes, D, Randall, D, Hall, T. (2013). The inconsistent measurement of message chains. In 2013 4th International workshop on emerging trends in software metrics (WETSoM) (pp. 62–68): IEEE.

  • Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F. (2015). Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing, 163, 3–16.

    Article  Google Scholar 

  • Ciupke, O. (1999). Automatic detection of design problems in object-oriented reengineering. In Technology of object-oriented languages and systems, 1999. TOOLS 30 Proceedings (pp. 18–32): IEEE.

  • Di Nucci, D., Palomba, F., Tamburri, D.A., Serebrenik, A., De Lucia, A. (2018). Detecting code smells using machine learning techniques: are we there yet?. In 2018 IEEE 25th International conference on software analysis, evolution and reengineering SANER (pp. 612–621): IEEE.

  • Ferme, V. (2013). Jcodeodor: a software quality advisor through design flaws detection. Master’s thesis University of Milano-Bicocca, Milano, Italy.

  • Fontana, F.A., & Zanoni, M. (2017). Code smell severity classification using machine learning techniques. Knowledge-Based Systems, 128, 43–58.

    Article  Google Scholar 

  • Fontana, F.A., Braione, P., Zanoni, M. (2012). Automatic detection of bad smells in code: an experimental assessment. Journal of Object Technology, 11(2), 5–1.

    Google Scholar 

  • Fontana, F.A., Dietrich, J., Walter, B., Yamashita, A., Zanoni, M. (2016a). Antipattern and code smell false positives: preliminary conceptualization and classification. In 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), (Vol. 1 pp. 609–613): IEEE.

  • Fontana, F.A., Mäntylä, M.V., Zanoni, M., Marino, A. (2016b). Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering, 21(3), 1143–1191.

  • Fowler, M., Beck, K., Brant, J., Opdyke, W., Roberts, D. (1999). Refactoring: improving the design of existing programs.

  • Godbole, S, & Sarawagi, S. (2004). Discriminative methods for multi-labeled classification. In Pacific-Asia conference on knowledge discovery and data mining (pp. 22–30): Springer.

  • Guo, Y., & Gu, S. (2011). Multi-label classification using conditional dependency networks. In IJCAI Proceedings-international joint conference on artificial intelligence, (Vol. 22 p. 1300).

  • Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S. (2011). Developing fault-prediction models: what the research can show industry. IEEE Software, 28(6), 96–99.

    Article  Google Scholar 

  • Kessentini, W., Kessentini, M., Sahraoui, H., Bechikh, S., Ouni, A. (2014). A cooperative parallel search-based software engineering approach for code-smells detection. IEEE Transactions on Software Engineering, 40(9), 841–861.

    Article  Google Scholar 

  • Khomh, F, Vaucher, S, Guéhéneuc, YG, Sahraoui, H. (2009). A Bayesian approach for the detection of code and design smells. In 9th International conference on quality software, 2009. QSIC’09 (pp. 305–314): IEEE.

  • Khomh, F., Vaucher, S., Guéhéneuc, Y.G, Sahraoui, H. (2011). Bdtex: a gqm-based Bayesian approach for the detection of antipatterns. Journal of Systems and Software, 84(4), 559–572.

    Article  Google Scholar 

  • Kreimer, J. (2005). Adaptive detection of design flaws. Electronic Notes in Theoretical Computer Science, 141(4), 117–136.

    Article  Google Scholar 

  • Liu, H., Guo, X., Shao, W. (2013). Monitor-based instant software refactoring. IEEE Transactions on Software Engineering, 1.

  • Maiga, A, Ali, N, Bhattacharya, N, Sabané, A, Guéhéneuc, YG, Antoniol, G, Aïmeur, E. (2012). Support vector machines for anti-pattern detection. In 2012 Proceedings of the 27th IEEE/ACM international conference on automated software engineering (ASE) (pp. 278–281): IEEE.

  • Maneerat, N., & Muenchaisri, P. (2011). Bad-smell prediction from software design model using machine learning techniques. In 2011 Eighth international joint conference on computer science and software engineering (JCSSE) (pp. 331–336): IEEE.

  • Marinescu, R. (2002). Measurement and quality in objectoriented design. IEEE International Conference on Software Maintenance.

  • Marinescu, R. (2004). Detection strategies: metrics-based rules for detecting design flaws. In 20th IEEE International conference on software maintenance, 2004. Proceedings (pp. 350–359): IEEE.

  • Marinescu, R. (2005). Measurement and quality in object-oriented design. In Proceedings of the 21st IEEE international conference on software maintenance, 2005. ICSM’05 (pp. 701–704): IEEE.

  • Moha, N., Gueheneuc, Y.G., Duchien, A.F., et al. (2010a). Decor: a method for the specification and detection of code and design smells. IEEE Transactions on Software Engineering (TSE), 36(1), 20–36.

  • Moha, N., Guéhéneuc, Y.G., Le Meur, A.F., Duchien, L., Tiberghien, A. (2010b). From a domain analysis to the specification and detection of code and design smells. Formal Aspects of Computing, 22(3-4), 345–361.

  • Murphy-Hill, E, & Black, AP. (2010). An interactive ambient visualization for code smells. In Proceedings of the 5th international symposium on software visualization (pp. 5–14): ACM.

  • Nongpong, K. (2012). Integrating “code smells” detection with refactoring tool support. Thesis, University of Wisconsin-Milwaukee.

  • Opdyke, W.F. (1992). Refactoring: a program restructuring aid in designing object-oriented application frameworks PhD thesis. PhD thesis: University of Illinois at Urbana-Champaign.

    Google Scholar 

  • Palomba, F, Bavota, G, Di Penta, M, Oliveto, R, De Lucia, A, Poshyvanyk, D. (2013). Detecting bad smells in source code using change history information. In Proceedings of the 28th IEEE/ACM international conference on automated software engineering (pp. 268–278): IEEE Press.

  • Palomba, F., Bavota, G., Di Penta, M., Oliveto, R., Poshyvanyk, D., De Lucia, A. (2015). Mining version histories for detecting code smells. IEEE Transactions on Software Engineering, 41(5), 462–489.

    Article  Google Scholar 

  • Palomba, F, Oliveto, R, De Lucia, A. (2017). Investigating code smell co-occurrences using association rule learning: a replicated study. In IEEE Workshop on machine learning techniques for software quality evaluation (MaLTeSQuE) (pp. 8–13): IEEE.

  • Palomba, F., Bavota, G., Di Penta, M., Fasano, F., Oliveto, R., De Lucia, A. (2018). On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation. Empirical Software Engineering, 23(3), 1188–1221.

    Article  Google Scholar 

  • Pecorelli, F, Di Nucci, D, De Roover, C, De Lucia, A. (2019a). On the role of data balancing for machine learning-based code smell detection. In Proceedings of the 3rd ACM SIGSOFT international workshop on machine learning techniques for software quality evaluation (pp. 19–24): ACM.

  • Pecorelli, F, Palomba, F, Di Nucci, D, De Lucia, A. (2019b). Comparing heuristic and machine learning approaches for metric-based code smell detection. In Proceedings of the 27th international conference on program comprehension (pp. 93–104): IEEE Press.

  • Rao, A.A., & Reddy, K.N. (2007). Detecting bad smells in object oriented design using design change propagation probability matrix 1.

  • Rasool, G., & Arshad, Z. (2015). A review of code smell mining techniques. Journal of Software: Evolution and Process, 27(11), 867–895.

    Google Scholar 

  • Read, J, Pfahringer, B, Holmes, G. (2008). Multi-label classification using ensembles of pruned sets. In 2008 Eighth IEEE international conference on data mining (pp. 995–1000): IEEE.

  • Read, J., Pfahringer, B., Holmes, G., Frank, E. (2011). Classifier chains for multi-label classification. Machine Learning, 85(3), 333.

    Article  MathSciNet  Google Scholar 

  • Read, J., Reutemann, P., Pfahringer, B., Holmes, G. (2016). Meka: a multi-label/multi-target extension to weka. The Journal of Machine Learning Research, 17(1), 667–671.

    MathSciNet  MATH  Google Scholar 

  • Sheikh, L.M., Tanveer, B., Hamdani, M. (2004). Interesting measures for mining association rules. In 8th International multitopic conference, 2004. Proceedings of INMIC 2004 (pp. 641–644): IEEE.

  • Sorower, M.S. (2010). A literature survey on algorithms for multi-label learning. Oregon State University, Corvallis, p. 18.

  • Tempero, E, Anslow, C, Dietrich, J, Han, T, Li, J, Lumpe, M, Melton, H, Noble, J. (2010). The qualitas corpus: a curated collection of java code for empirical studies. In Software engineering conference (APSEC), 2010 17th Asia Pacific (pp. 336–345): IEEE.

  • Travassos, G., Shull, F., Fredericks, M., Basili, V.R. (1999). Detecting defects in object-oriented designs: using reading techniques to increase software quality. In ACM sigplan notices, (Vol. 34 pp. 47–56): ACM.

  • Tsantalis, N., & Chatzigeorgiou, A. (2009). Identification of move method refactoring opportunities. IEEE Transactions on Software Engineering, 35(3), 347–367.

    Article  Google Scholar 

  • Tsoumakas, G., & Katakis, I. (2007). Multi-label classification: an overview. International Journal of Data Warehousing and Mining (IJDWM), 3(3), 1–13.

    Article  Google Scholar 

  • Tsoumakas, G., Katakis, I., Vlahavas, I. (2011). Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering, 23 (7), 1079–1089.

    Article  Google Scholar 

  • Tufano, M., Palomba, F., Bavota, G., Oliveto, R., Di Penta, M., De Lucia, A., Poshyvanyk, D. (2017). When and why your code starts to smell bad (and whether the smells go away). IEEE Transactions on Software Engineering, 43(11), 1063–1088.

    Article  Google Scholar 

  • Wang, X, Dang, Y, Zhang, L, Zhang, D, Lan, E, Mei, H. (2012). Can i clone this piece of code here?. In Proceedings of the 27th IEEE/ACM international conference on automated software engineering (pp. 170–179): ACM.

  • White, M, Tufano, M, Vendome, C, Poshyvanyk, D. (2016). Deep learning code fragments for code clone detection. In Proceedings of the 31st IEEE/ACM international conference on automated software engineering (pp. 87–98): ACM.

  • Yang, J., Hotta, K., Higo, Y., Igaki, H., Kusumoto, S. (2015). Classification model for code clones based on machine learning. Empirical Software Engineering, 20 (4), 1095–1125.

    Article  Google Scholar 

  • Zaidi, MA, & Colomo-Palacios, R. (2019). Code smells enabled by artificial intelligence: a systematic mapping. In International conference on computational science and its applications (pp. 418–427): Springer.

  • Zhang, M.-L., & Zhou, Z.-H. (2013). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thirupathi Guggulothu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection on Quality Management for Information Systems

Guest Editors: Mario Piattini, Ignacio García Rodríguez de Guzmán, Ricardo Pérez del Castillo

Appendix

Appendix

Table 13 Results of BR method using top 5 single label classifers
Table 14 Results of CC method using top 5 single label classifers
Table 15 Results of LC method using top 5 single label classifers

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guggulothu, T., Moiz, S.A. Code smell detection using multi-label classification approach. Software Qual J 28, 1063–1086 (2020). https://doi.org/10.1007/s11219-020-09498-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-020-09498-y

Keywords

Navigation