CommtPst: Deep learning source code for commenting positions prediction,Journal of Systems and Software

当前位置： X-MOL 学术 › J. Syst. Softw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CommtPst: Deep learning source code for commenting positions prediction
Journal of Systems and Software ( IF 3.7 ) Pub Date : 2020-12-01 , DOI: 10.1016/j.jss.2020.110754
Yuan Huang , Xinyu Hu , Nan Jia , Xiangping Chen , Zibin Zheng , Xiapu Luo

Abstract Existing techniques for automatic code commenting assume that the code snippet to be commented has been identified, thus requiring users to provide the code snippet in advance. A smarter commenting approach is desired to first self-determine where to comment in a given source code and then generate comments for the code snippets that need comments. To achieve the first step of this goal, we propose a novel method, CommtPst, to automatically find the appropriate commenting positions in the source code. Since commenting is closely related to the code syntax and semantics, we adopt neural language model (word embeddings) to capture the code semantic information, and analyze the abstract syntax trees to capture code syntactic information. Then, we employ LSTM (long short term memory) to model the long-term logical dependency of code statements over the fused semantic and syntactic information and learn the commenting patterns on the code sequence. We evaluated CommtPst using large data sets from dozens of open-source software systems in GitHub. The experimental results show that the precision, recall and F-Measure values achieved by CommtPst are 0.792, 0.602 and 0.684, respectively, which outperforms the traditional machine learning method with 11.4% improvement on F-measure.

中文翻译：

CommtPst：用于评论位置预测的深度学习源代码

摘要现有的自动代码注释技术假设要注释的代码片段已经被识别，因此需要用户提前提供代码片段。需要一种更智能的注释方法，首先自行确定在给定源代码中注释的位置，然后为需要注释的代码片段生成注释。为了实现这一目标的第一步，我们提出了一种新颖的方法 CommtPst，可以在源代码中自动找到合适的注释位置。由于注释与代码句法语义密切相关，我们采用神经语言模型（词嵌入）来捕获代码语义信息，并通过分析抽象语法树来捕获代码句法信息。然后，我们使用 LSTM（长短期记忆）来建模代码语句对融合语义和句法信息的长期逻辑依赖性，并学习代码序列上的注释模式。我们使用来自 GitHub 中数十个开源软件系统的大型数据集评估了 CommtPst。实验结果表明，CommtPst 实现的精度、召回率和 F-Measure 值分别为 0.792、0.602 和 0.684，优于传统机器学习方法，F-measure 提高了 11.4%。

更新日期：2020-12-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11