Why My Code Summarization Model Does Not Work,ACM Transactions on Software Engineering and Methodology

当前位置： X-MOL 学术 › ACM Trans. Softw. Eng. Methodol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Why My Code Summarization Model Does Not Work
ACM Transactions on Software Engineering and Methodology ( IF 6.6 ) Pub Date : 2021-02-10 , DOI: 10.1145/3434280
Qiuyuan Chen ₁ , Xin Xia ₂ , Han Hu ₂ , David Lo ₃ , Shanping Li ₁

Affiliation

Code summarization aims at generating a code comment given a block of source code and it is normally performed by training machine learning algorithms on existing code block-comment pairs. Code comments in practice have different intentions. For example, some code comments might explain how the methods work, while others explain why some methods are written. Previous works have shown that a relationship exists between a code block and the category of a comment associated with it. In this article, we aim to investigate to which extent we can exploit this relationship to improve code summarization performance. We first classify comments into six intention categories and manually label 20,000 code-comment pairs. These categories include “what,” “why,” “how-to-use,” “how-it-is-done,” “property,” and “others.” Based on this dataset, we conduct an experiment to investigate the performance of different state-of-the-art code summarization approaches on the categories. We find that the performance of different code summarization approaches varies substantially across the categories. Moreover, the category for which a code summarization model performs the best is different for the different models. In particular, no models perform the best for “why” and “property” comments among the six categories. We design a composite approach to demonstrate that comment category prediction can boost code summarization to reach better results. The approach leverages classified code-category labeled data to train a classifier to infer categories. Then it selects the most suitable models for inferred categories and outputs the composite results. Our composite approach outperforms other approaches that do not consider comment categories and obtains a relative improvement of 8.57% and 16.34% in terms of ROUGE-L and BLEU-4 score, respectively.

中文翻译：

为什么我的代码摘要模型不起作用

代码摘要旨在在给定源代码块的情况下生成代码注释，通常通过在现有代码块-注释对上训练机器学习算法来执行。实践中的代码注释有不同的意图。例如，一些代码注释可能会解释这些方法是如何工作的，而另一些则解释为什么要编写一些方法。以前的工作表明，代码块和与之相关的评论类别之间存在关系。在本文中，我们旨在调查我们可以在多大程度上利用这种关系来提高代码摘要性能。我们首先将评论分为六个意图类别，并手动标记 20,000 个代码评论对。这些类别包括“什么，” “为什么，” “如何使用，” “它是如何完成的，” “财产，”和“其他。”基于该数据集，我们进行了一项实验，以研究不同最先进的代码摘要方法在类别上的性能。我们发现不同代码摘要方法的性能在不同类别中差异很大。此外，对于不同的模型，代码摘要模型表现最好的类别是不同的。特别是，没有模型表现最好“为什么”和“财产”六个类别中的评论。我们设计了一种复合方法来证明评论类别预测可以促进代码摘要以获得更好的结果。该方法利用分类代码类别标记的数据来训练分类器来推断类别。然后它为推断的类别选择最合适的模型并输出复合结果。我们的复合方法优于其他不考虑评论类别的方法，并在以下方面获得了 8.57% 和 16.34% 的相对改进胭脂-L和BLEU-4分别得分。

更新日期：2021-02-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11