Toward the Automatic Classification of Self-Affirmed Refactoring,arXiv - CS - Software Engineering

当前位置： X-MOL 学术 › arXiv.cs.SE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Toward the Automatic Classification of Self-Affirmed Refactoring
arXiv - CS - Software Engineering Pub Date : 2020-09-19 , DOI: arxiv-2009.09279
Eman Abdullah AlOmar, Mohamed Wiem Mkaouer, Ali Ouni

The concept of Self-Affirmed Refactoring (SAR) was introduced to explore how developers document their refactoring activities in commit messages, i.e., developers' explicit documentation of refactoring operations intentionally introduced during a code change. In our previous study, we have manually identified refactoring patterns and defined three main common quality improvement categories, including internal quality attributes, external quality attributes, and code smells, by only considering refactoring-related commits. However, this approach heavily depends on the manual inspection of commit messages. In this paper, we propose a two-step approach to first identify whether a commit describes developer-related refactoring events, then to classify it according to the refactoring common quality improvement categories. Specifically, we combine the N-Gram TF-IDF feature selection with binary and multiclass classifiers to build a new model to automate the classification of refactorings based on their quality improvement categories. We challenge our model using a total of 2,867 commit messages extracted from well-engineered open-source Java projects. Our findings show that (1) our model is able to accurately classify SAR commits, outperforming the pattern-based and random classifier approaches, and allowing the discovery of 40 more relevant SAR patterns, and (2) our model reaches an F-measure of up to 90% even with a relatively small training dataset.

中文翻译：

走向自我肯定重构的自动分类

引入自我确认重构 (SAR) 的概念是为了探索开发人员如何在提交消息中记录他们的重构活动，即开发人员在代码更改期间有意引入的重构操作的显式文档。在我们之前的研究中，我们通过仅考虑与重构相关的提交，手动识别了重构模式并定义了三个主要的常见质量改进类别，包括内部质量属性、外部质量属性和代码异味。但是，这种方法在很大程度上依赖于对提交消息的手动检查。在本文中，我们提出了一种两步法，首先确定提交是否描述了与开发人员相关的重构事件，然后根据重构常见的质量改进类别对其进行分类。具体来说，我们将 N-Gram TF-IDF 特征选择与二元和多类分类器相结合，构建了一个新模型，根据重构的质量改进类别自动进行分类。我们使用从精心设计的开源 Java 项目中提取的总共 2,867 条提交消息来挑战我们的模型。我们的研究结果表明（1）我们的模型能够准确地对 SAR 提交进行分类，优于基于模式和随机分类器的方法，并允许发现 40 个更多相关的 SAR 模式，以及（2）我们的模型达到了即使使用相对较小的训练数据集，也高达 90%。从精心设计的开源 Java 项目中提取的 867 条提交消息。我们的研究结果表明（1）我们的模型能够准确地对 SAR 提交进行分类，优于基于模式和随机分类器的方法，并允许发现 40 个更多相关的 SAR 模式，以及（2）我们的模型达到了即使使用相对较小的训练数据集，也高达 90%。从精心设计的开源 Java 项目中提取的 867 条提交消息。我们的研究结果表明（1）我们的模型能够准确地对 SAR 提交进行分类，优于基于模式和随机分类器的方法，并允许发现 40 个更多相关的 SAR 模式，以及（2）我们的模型达到了即使使用相对较小的训练数据集，也高达 90%。

更新日期：2020-09-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>