Code to Comment Translation: A Comparative Study on Model Effectiveness & Errors,arXiv - CS - Software Engineering

当前位置： X-MOL 学术 › arXiv.cs.SE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Code to Comment Translation: A Comparative Study on Model Effectiveness & Errors
arXiv - CS - Software Engineering Pub Date : 2021-06-15 , DOI: arxiv-2106.08415
Junayed Mahmud, Fahim Faisal, Raihan Islam Arnob, Antonios Anastasopoulos, Kevin Moran

Automated source code summarization is a popular software engineering research topic wherein machine translation models are employed to "translate" code snippets into relevant natural language descriptions. Most evaluations of such models are conducted using automatic reference-based metrics. However, given the relatively large semantic gap between programming languages and natural language, we argue that this line of research would benefit from a qualitative investigation into the various error modes of current state-of-the-art models. Therefore, in this work, we perform both a quantitative and qualitative comparison of three recently proposed source code summarization models. In our quantitative evaluation, we compare the models based on the smoothed BLEU-4, METEOR, and ROUGE-L machine translation metrics, and in our qualitative evaluation, we perform a manual open-coding of the most common errors committed by the models when compared to ground truth captions. Our investigation reveals new insights into the relationship between metric-based performance and model prediction errors grounded in an empirically derived error taxonomy that can be used to drive future research efforts

中文翻译：

Code to Comment 翻译：模型有效性和错误的比较研究

自动源代码摘要是一个流行的软件工程研究主题，其中使用机器翻译模型将代码片段“翻译”为相关的自然语言描述。大多数此类模型的评估都是使用基于自动参考的指标进行的。然而，考虑到编程语言和自然语言之间相对较大的语义差距，我们认为这一系列研究将受益于对当前最先进模型的各种错误模式的定性调查。因此，在这项工作中，我们对最近提出的三个源代码摘要模型进行了定量和定性比较。在我们的定量评估中，我们比较了基于平滑 BLEU-4、METEOR 和 ROUGE-L 机器翻译指标的模型，在我们的定性评估中，与地面实况字幕相比，我们对模型所犯的最常见错误进行了手动开放编码。我们的调查揭示了对基于度量的性能与模型预测误差之间关系的新见解，这些误差基于经验派生的误差分类法，可用于推动未来的研究工作

更新日期：2021-06-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>