Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics,Journal of Computer Science and Technology

当前位置： X-MOL 学术 › J. Comput. Sci. Tech. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics
Journal of Computer Science and Technology ( IF 1.2 ) Pub Date : 2020-11-01 , DOI: 10.1007/s11390-020-0323-7
Mohammad Y. Mhawish , Manjari Gupta

Code smell detection is essential to improve software quality, enhancing software maintainability, and decrease the risk of faults and failures in the software system. In this paper, we proposed a code smell prediction approach based on machine learning techniques and software metrics. The local interpretable model-agnostic explanations (LIME) algorithm was further used to explain the machine learning model’s predictions and interpretability. The datasets obtained from Fontana et al. were reformed and used to build binary-label and multi-label datasets. The results of 10-fold cross-validation show that the performance of tree-based algorithms (mainly Random Forest) is higher compared with kernel-based and network-based algorithms. The genetic algorithm based feature selection methods enhance the accuracy of these machine learning algorithms by selecting the most relevant features in each dataset. Moreover, the parameter optimization techniques based on the grid search algorithm significantly enhance the accuracy of all these algorithms. Finally, machine learning techniques have high potential in predicting the code smells, which contribute to detect these smells and enhance the software’s quality.

中文翻译：

预测代码味道和预测分析：使用机器学习技术和软件指标

代码异味检测对于提高软件质量、增强软件可维护性以及降低软件系统中出现故障和故障的风险至关重要。在本文中，我们提出了一种基于机器学习技术和软件指标的代码气味预测方法。局部可解释模型不可知解释 (LIME) 算法进一步用于解释机器学习模型的预测和可解释性。从 Fontana 等人获得的数据集。被改造并用于构建二元标签和多标签数据集。10折交叉验证的结果表明，与基于内核和基于网络的算法相比，基于树的算法（主要是随机森林）的性能更高。基于遗传算法的特征选择方法通过选择每个数据集中最相关的特征来提高这些机器学习算法的准确性。此外，基于网格搜索算法的参数优化技术显着提高了所有这些算法的准确性。最后，机器学习技术在预测代码异味方面具有很大潜力，这有助于检测这些异味并提高软件质量。

更新日期：2020-11-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11