Code smell detection using feature selection and stacking ensemble: An empirical investigation,Information and Software Technology

当前位置： X-MOL 学术 › Inf. Softw. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Code smell detection using feature selection and stacking ensemble: An empirical investigation
Information and Software Technology ( IF 3.9 ) Pub Date : 2021-05-27 , DOI: 10.1016/j.infsof.2021.106648
Amal Alazba , Hamoud Aljamaan

Context:

Code smell detection is the process of identifying code pieces that are poorly designed and implemented. Recently more research has been directed towards machine learning-based approaches for code smells detection. Many classifiers have been explored in the literature, yet, finding an effective model to detect different code smells types has not yet been achieved.

Objective:

The main objective of this paper is to empirically investigate the capabilities of stacking heterogeneous ensemble model in code smell detection.

Methods:

Gain feature selection technique was applied to select relevant features in code smell detection. Detection performance of 14 individual classifiers was investigated in the context of two class-level and four method-level code smells. Then, three stacking ensembles were built using all individual classifiers as base classifiers, and three different meta-classifiers (LR, SVM and DT).

Results:

GP, MLP, DT and SVM(Lin) classifiers were among the best performing classifiers in detecting most of the code smells. On the other hand, SVM(Sig), NB(B), NB(M), and SGD were among the least accurate classifiers for most smell types. The stacking ensemble with LR and SVM meta-classifiers achieved a consistent high detection performance in class-level and method-level code smells compared to all individual models.

Conclusion:

This paper concludes that the detection performance of the majority of individual classifiers varied from one code smell type to another. However, the detection performance of the stacking ensemble with LR and SVM meta-classifiers was consistently superior over all individual classifiers in detecting different code smell types.

中文翻译：

使用特征选择和堆叠集成的代码气味检测：一项实证研究

语境：

代码异味检测是识别设计和实现不佳的代码片段的过程。最近，更多的研究针对基于机器学习的代码气味检测方法。文献中已经探索了许多分类器，但尚未找到一种有效的模型来检测不同的代码气味类型。

客观的：

本文的主要目的是实证研究堆叠异构集成模型在代码气味检测中的能力。

方法：

在代码气味检测中应用增益特征选择技术来选择相关特征。在两个类级别和四个方法级别的代码气味的背景下研究了 14 个单独分类器的检测性能。然后，使用所有单个分类器作为基分类器和三个不同的元分类器（LR、SVM 和 DT）构建三个堆叠集成。

结果：

GP、MLP、DT 和 SVM(Lin) 分类器在检测大多数代码异味方面是性能最好的分类器之一。另一方面，SVM(Sig)、NB(B)、NB(M) 和 SGD 是大多数气味类型最不准确的分类器。与所有单个模型相比，具有 LR 和 SVM 元分类器的堆叠集成在类级和方法级代码气味方面实现了一致的高检测性能。