当前位置: X-MOL 学术Front. Comput. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Using BiLSTM with attention mechanism to automatically detect self-admitted technical debt
Frontiers of Computer Science ( IF 4.2 ) Pub Date : 2021-05-25 , DOI: 10.1007/s11704-020-9281-z
Dongjin Yu , Lin Wang , Xin Chen , Jie Chen

Technical debt is a metaphor for seeking short-term gains at expense of long-term code quality. Previous studies have shown that self-admitted technical debt, which is introduced intentionally, has strong negative impacts on software development and incurs high maintenance overheads. To help developers identify self-admitted technical debt, researchers have proposed many state-of-the-art methods. However, there is still room for improvement about the effectiveness of the current methods, as self-admitted technical debt comments have the characteristics of length variability, low proportion and style diversity. Therefore, in this paper, we propose a novel approach based on the bidirectional long short-term memory (BiLSTM) networks with the attention mechanism to automatically detect self-admitted technical debt by leveraging source code comments. In BiLSTM, we utilize a balanced cross entropy loss function to overcome the class unbalance problem. We experimentally investigate the performance of our approach on a public dataset including 62, 566 code comments from ten open source projects. Experimental results show that our approach achieves 81.75% in terms of precision, 72.24% in terms of recall and 75.86% in terms of F1-score on average and outperforms the state-of-the-art text mining-based method by 8.14%, 5.49% and 6.64%, respectively.



中文翻译:

使用具有关注机制的BiLSTM自动检测自我承认的技术债务

技术债务是一种以牺牲长期代码质量为代价而寻求短期收益的隐喻。先前的研究表明,故意引入的自负技术债务会对软件开发产生严重的负面影响,并导致高昂的维护费用。为了帮助开发人员确定自行承担的技术债务,研究人员提出了许多最先进的方法。但是,由于自承认的技术债务评论具有长度可变,比例低和样式多样的特征,因此当前方法的有效性仍有改进的余地。因此,在本文中,我们提出了一种基于双向长短期记忆(BiLSTM)网络的新颖方法,该方法具有注意力机制,可以通过利用源代码注释自动检测自我承认的技术债务。在BiLSTM中,我们利用平衡的交叉熵损失函数来克服类不平衡问题。我们通过实验研究了我们的方法在公共数据集上的性能,该数据集包含来自十个开源项目的62、566条代码注释。实验结果表明,我们的方法的平均精度达到81.75%,召回率达到72.24%,F1得分达到75.86%,比基于文本挖掘的最新方法高出8.14%,分别为5.49%和6.64%。

更新日期:2021-05-25
down
wechat
bug