Code smell detection using feature selection and stacking ensemble: An empirical investigation
Introduction
Software engineers occasionally make poor implementation decisions due to time pressures resulting from tight project deadlines that lead to the need for code cleaning (i.e. refactoring). Software engineers consider the refactoring process when the software is being used for a long time and has the potential for additional development [1]. Refactoring is defined as the process of improving the software internal structure quality without altering system functionality [2]. Even though refactoring is considered a time-consuming and labor-intensive process, many benefits are gained from the refactoring process such as: improved software quality and design, increased software readability and understandability, locating software defects and facilitating the software development process [2]. However, the refactoring process is tricky and raises several challenges since it requires changes that might introduce new bugs and alter the software’s internal behavior [3]. Therefore, a key element to the success of the refactoring process is identifying the code fragments that need to be changed (i.e. code smell).
Code smells are poor design and implementation choices that might negatively affect important software quality attributes (e.g. understandability, reusability and maintainability) [4], [5]. Different code smell types exist in the software engineering literature such as: duplicated code, long method, large class, long parameter list, etc. [2]. These smells can be considered as a refactoring opportunity indicator [6]. Therefore, it is quite important to detect code smells in order to refactor them. The process of identifying code smells is commonly known as code smell detection. There are three main approaches to code smell detection: metrics-based [7], rule-based [8] and machine learning-based approaches [9]. The metrics-based approach requires defining quality metrics such as: inheritance, size and cohesion then establishing a threshold value for each metric. However, choosing and identifying the right threshold is not a trivial task in this approach. Next, in the rule-based approach, the domain experts need to specify certain rules to define each code smell. These rules are sometimes manually generated and use domain-specific language. Due to the effort and cognitive load that is required from software engineers in the former approaches, recent research has been directed more towards machine learning approaches [9], [10], [11].
Recently, the application of machine learning classifiers in software code smell detection has been investigated, where machine learning classifiers create code smell detection rules and thresholds. In this approach, software metrics are extracted from the source code and fed to the machine learning classifiers to produce detection rules and thresholds automatically. Hence, this reduces the effort required by software engineers and facilitates the detection process. Based on recently published systematic literature reviews [12], [13], several machine learning classifiers have been utilized in detecting code smells including [14], [15], [16]: decision trees, support vector machines, random forest and naive Bayes. However, finding a classifier which has an effective detection performance over different code smells has not yet been achieved [12], [13].
Ensemble learning [17] is an active research area in the machine learning field, which can enhance the performance of individual machine learning classifiers. Ensembles combine the output of individual classifiers into a single output. Ensembles can be classified into two categories: homogeneous and heterogeneous. Homogeneous ensembles are built with classifiers of the same type trained on different parts of the dataset, while heterogeneous ensembles are built using different classifier types. Ensemble models have been proven to be an effective method to achieve better performance over individual classifiers in class defect prediction [18], [19] and software maintainability [20]. However, in recent systematic literature reviews [12], [13], it was reported that ensemble models have not been largely explored in relation to code smell detection. In this paper, we investigate the use of machine learning-based approaches employing ensemble learning in code smell detection. An empirical study is conducted to investigate to what extent stacking heterogeneous ensemble [21] increase the performance of code smell detection over individual classifiers. The use of ensemble learning is expected to improve the performance of the detection models by facilitating the refactoring process, which leads to overall improved software quality.
Paper Organization. We followed a structured experiment report template and guidelines proposed by A. Jedlitschka et al. [22] to report empirical software engineering investigations. The rest of this paper is organized as follows: Section 2 sheds light on the background needed in this research. Section 3 summarizes the related literature. Section 4 gives a detailed description of the empirical study goal, the code smell datasets used, data preprocessing performed, and evaluation measures used to evaluate classifiers detection performance. Section 5 discusses in detail the detection performance results for the individual classifiers and ensemble models. Section 6 discusses the identified threats to validity of our empirical study. Section 7 concludes the paper with directions for future work.
Section snippets
Background
In this section, we outline the definition of the code smells investigated in this research. Then, we present an overview of the heterogeneous stacking ensemble used for code smell detection.
Literature review
Identifying code smells in source code is an active research area in software refactoring. Different techniques have been proposed to identify and detect code smells such as metrics-based [7], rule-based [8] and machine learning-based [9]. However, few researchers have investigated the employment of machine learning techniques in code smell detection [12].
Different types of machine learning classifiers have been used to detect code smells, and most of the reported empirical studies employ a
Empirical study
In this section, we describe in detail our empirical study. The empirical study goal is clearly stated. Then, we overview the used code smell datasets, followed by a description of the data processing steps. We then describe the model validation implemented to validate the built machine learning classifiers. Finally, detection performance metrics are presented.
Results and discussions
In our experiment, the machine learning pipeline was implemented in Python. A total of 14 classifiers were used: DT, SVM with four kernels (Lin, Poly, Sig and RBF), Bernoulli NB, Gaussian NB, multinomial NB, LR, MLP, SGD, GP, KNN and LDA. We used the 14 classifiers as base classifiers to build the stacking ensemble. All of the classifiers and ensemble models were built and trained using the scikit-learn framework [63].
Threats to validity
Identifying and assessing threats to validity is necessary to ensure the quality of our empirical study findings. Each threat is discussed with the measures taken to mitigate the identified threats.
Conclusion
This paper empirically investigated the application of stacking ensemble in code smell detection and evaluated to what extent the stacking ensemble offers an increase in detection performance over individual classifiers. Paper main contributions can be summarized as follows: First, we applied the gain ratio feature selection technique to investigate the importance of metrics with different granularity as predictors in code smell detection. Second, we evaluated the application of 14 individual
CRediT authorship contribution statement
Amal Alazba: Conceptualization, Methodology, Software, Data curation, Design, analysis, Writing, and revision of the manuscript. Hamoud Aljamaan: Conceptualization, Methodology, Software, Data curation, Design, analysis, Writing, and revision of the manuscrip.
Declaration of Competing Interest
No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.infsof.2021.106648.
Acknowledgment
The authors would like to acknowledge the support of King Fahd University of Petroleum and Minerals (KFUPM), Saudi Arabia in the development of this work.
References (65)
12 - the development process
- et al.
Code smells as system-level indicators of maintainability: An empirical study
J. Syst. Softw.
(2013) - et al.
An experimental investigation on the innate relationship between quality and refactoring
J. Syst. Softw.
(2015) - et al.
Machine learning techniques for code smell detection: A systematic literature review and meta-analysis
Inf. Softw. Technol.
(2019) - et al.
Code smell severity classification using machine learning techniques
Knowl.-Based Syst.
(2017) Stacked generalization
Neural Netw.
(1992)- et al.
BDTEX: A GQM-based Bayesian approach for the detection of antipatterns
J. Syst. Softw.
(2011) - et al.
Irrelevant features and the subset selection problem
The importance of complexity in model selection
J. Math. Psych.
(2000)- et al.
Refactoring: Improving the Design of Existing Code
(1999)
A field study of refactoring challenges and benefits
Refactoring planning for design smell correction: Summary, opportunities and lessons learned
Size and cohesion metrics as indicators of the long method bad smell: An empirical study
DECOR: A method for the specification and detection of code and design smells
IEEE Trans. Softw. Eng.
Software design smell detection: a systematic mapping study
Softw. Qual. J.
Deep learning based feature envy detection
A machine-learning based ensemble method for anti-patterns detection
J. Syst. Softw.
Bad smell detection using machine learning techniques: A systematic literature review
Arab. J. Sci. Eng.
Comparing and experimenting machine learning techniques for code smell detection
Empir. Softw. Eng.
Ensemble-based classifiers
Artif. Intell. Rev.
Three empirical studies on predicting software maintainability using ensemble methods
Soft Comput.
Reporting experiments in software engineering
AntiPatterns: Refactoring Software, Architectures, and Projects in Crisis
A survey of decision tree classifier methodology
IEEE Trans. Syst. Man Cybern.
Support-vector networks
Mach. Learn.
Beyond independence: conditions for the optimality of the simple Bayesian classifier
Hydrological modelling using artificial neural networks
Prog. Phys. Geogr. Earth Environ.
Large-scale machine learning with stochastic gradient descent
Cited by (35)
Revisiting Code Smell Severity Prioritization using learning to rank techniques
2024, Expert Systems with ApplicationsA longitudinal study on the temporal validity of software samples
2024, Information and Software TechnologyOn the effectiveness of developer features in code smell prioritization: A replication study
2024, Journal of Systems and SoftwareAligning XAI explanations with software developers’ expectations: A case study with code smell prioritization
2024, Expert Systems with ApplicationsA survey on machine learning techniques applied to source code
2024, Journal of Systems and SoftwareCBReT: A Cluster-Based Resampling Technique for dealing with imbalanced data in code smell prediction
2024, Knowledge-Based Systems
- 1
Both authors contributed equally to the work done in this manuscript.