The Effectiveness of Supervised Machine Learning Algorithms in Predicting Software Refactoring,arXiv - CS - Software Engineering

当前位置： X-MOL 学术 › arXiv.cs.SE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The Effectiveness of Supervised Machine Learning Algorithms in Predicting Software Refactoring
arXiv - CS - Software Engineering Pub Date : 2020-01-10 , DOI: arxiv-2001.03338
Maur\'icio Aniche, Erick Maziero, Rafael Durelli, Vinicius Durelli

Refactoring is the process of changing the internal structure of software to improve its quality without modifying its external behavior. Empirical studies have repeatedly shown that refactoring has a positive impact on the understandability and maintainability of software systems. However, before carrying out refactoring activities, developers need to identify refactoring opportunities. Currently, refactoring opportunity identification heavily relies on developers' expertise and intuition. In this paper, we investigate the effectiveness of machine learning algorithms in predicting software refactorings. More specifically, we train six different machine learning algorithms (i.e., Logistic Regression, Naive Bayes, Support Vector Machine, Decision Trees, Random Forest, and Neural Network) with a dataset comprising over two million refactorings from 11,149 real-world projects from the Apache, F-Droid, and GitHub ecosystems. The resulting models predict 20 different refactorings at class, method, and variable-levels with an accuracy often higher than 90%. Our results show that (i) Random Forests are the best models for predicting software refactoring, (ii) process and ownership metrics seem to play a crucial role in the creation of better models, and (iii) models generalize well in different contexts.

中文翻译：

有监督机器学习算法在预测软件重构中的有效性

重构是改变软件内部结构以提高其质量而不修改其外部行为的过程。实证研究一再表明，重构对软件系统的可理解性和可维护性有积极的影响。但是，在进行重构活动之前，开发人员需要识别重构机会。目前，重构机会识别在很大程度上依赖于开发人员的专业知识和直觉。在本文中，我们研究了机器学习算法在预测软件重构方面的有效性。更具体地说，我们训练了六种不同的机器学习算法（即逻辑回归、朴素贝叶斯、支持向量机、决策树、随机森林、和神经网络），其数据集包含来自 Apache、F-Droid 和 GitHub 生态系统的 11,149 个实际项目的超过 200 万次重构。生成的模型在类、方法和变量级别预测 20 种不同的重构，准确率通常高于 90%。我们的结果表明：(i) 随机森林是预测软件重构的最佳模型，(ii) 过程和所有权指标似乎在创建更好的模型方面发挥着至关重要的作用，以及 (iii) 模型在不同的上下文中可以很好地概括。

更新日期：2020-09-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>