Augmenting commit classification by using fine-grained source code changes and a pre-trained deep neural language model,Information and Software Technology

当前位置： X-MOL 学术 › Inf. Softw. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Augmenting commit classification by using fine-grained source code changes and a pre-trained deep neural language model
Information and Software Technology ( IF 3.8 ) Pub Date : 2021-03-10 , DOI: 10.1016/j.infsof.2021.106566
Lobna Ghadhab , Ilyes Jenhani , Mohamed Wiem Mkaouer , Montassar Ben Messaoud

Context:

Analyzing software maintenance activities is very helpful in ensuring cost-effective evolution and development activities. The categorization of commits into maintenance tasks supports practitioners in making decisions about resource allocation and managing technical debt.

Objective:

In this paper, we propose to use a pre-trained language neural model, namely BERT (Bidirectional Encoder Representations from Transformers) for the classification of commits into three categories of maintenance tasks — corrective, perfective and adaptive. The proposed commit classification approach will help the classifier better understand the context of each word in the commit message.

Methods:

We built a balanced dataset of 1793 labeled commits that we collected from publicly available datasets. We used several popular code change distillers to extract fine-grained code changes that we have incorporated into our dataset as additional features to BERT’s word representation features. In our study, a deep neural network (DNN) classifier has been used as an additional layer to fine-tune the BERT model on the task of commit classification. Several models have been evaluated to come up with a deep analysis of the impact of code changes on the classification performance of each commit category.

Results and conclusions:

Experimental results have shown that the DNN model trained on BERT’s word representations and Fixminer code changes (DNN@BERT+Fix_cc) provided the best performance and achieved 79.66% accuracy and a macro-average f1 score of 0.8. Comparison with the state-of-the-art model that combines keywords and code changes (RF@KW+CD_cc) has shown that our model achieved approximately 8% improvement in accuracy. Results have also shown that a DNN model using only BERT’s word representation features achieved an improvement of 5% in accuracy compared to the RF@KW+CD_cc model.

中文翻译：

通过使用细粒度的源代码更改和预先训练的深度神经语言模型来增强提交分类

语境：

分析软件维护活动对于确保具有成本效益的演化和开发活动非常有帮助。将提交分类到维护任务中，可以帮助从业人员做出有关资源分配和管理技术债务的决策。

客观的：

在本文中，我们建议使用经过预先训练的语言神经模型，即BERT（来自变压器的双向编码器表示）将提交分类为维护任务的三类-纠正性，完美性和自适应性。提出的提交分类方法将帮助分类器更好地理解提交消息中每个单词的上下文。

方法：

我们建立了一个平衡的数据集，其中包含从公开数据集中收集的1793个带标签的提交。我们使用了几种流行的代码更改蒸馏器来提取细粒度的代码更改，这些代码更改已作为BERT单词表示功能的附加功能并入了我们的数据集。在我们的研究中，深度神经网络（DNN）分类器已被用作对提交分类任务上的BERT模型进行微调的附加层。已对几种模型进行了评估，以对代码更改对每个提交类别的分类性能的影响进行深入分析。

结果与结论：

实验结果表明，在BERT的单词表示和Fixminer代码更改（DNN @ BERT + Fix_cc）上训练的DNN模型提供了最佳性能，达到了79.66％的准确度，并且宏平均f1得分为0.8。与结合了关键字和代码更改的最新模型（RF @ KW + CD_cc）的比较表明，我们的模型的准确性提高了大约8％。结果还表明，与RF @ KW + CD_cc模型相比，仅使用BERT的单词表示功能的DNN模型的准确性提高了5％。

更新日期：2021-03-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11