Effective approaches to combining lexical and syntactical information for code summarization,Software: Practice and Experience

当前位置： X-MOL 学术 › Softw. Pract. Exp. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Effective approaches to combining lexical and syntactical information for code summarization
Software: Practice and Experience ( IF 2.6 ) Pub Date : 2020-09-24 , DOI: 10.1002/spe.2893
Ziyi Zhou ₁ , Huiqun Yu _{1,

2} , Guisheng Fan ₁

Affiliation

Natural language summaries of source codes are important during software development and maintenance. Recently, deep learning based models have achieved good performance on the task of automatic code summarization, which encode token sequence or abstract syntax tree (AST) of code with neural networks. However, there has been little work on the efficient combination of lexical and syntactical information of code for better summarization quality. In this paper, we propose two general and effective approaches to leveraging both types of information: a convolutional neural network that aims to better extract vector representation of AST node for downstream models; and a Switch Network that learns an adaptive weight vector to combine different code representations for summary generation. We integrate these approaches into a comprehensive code summarization model, which includes a sequential encoder for token sequence of code and a tree based encoder for its AST. We evaluate our model on a large Java dataset. The experimental results show that our model outperforms several state‐of‐the‐art models on various metrics, and the proposed approaches contribute a lot to the improvements.

中文翻译：

结合词法和句法信息进行代码总结的有效方法

源代码的自然语言摘要在软件开发和维护过程中很重要。最近，基于深度学习的模型在自动代码摘要任务上取得了良好的性能，该任务使用神经网络对代码的令牌序列或抽象语法树 (AST) 进行编码。然而，关于代码的词法和句法信息的有效组合以提高摘要质量的工作很少。在本文中，我们提出了两种通用且有效的方法来利用这两种类型的信息：卷积神经网络旨在更好地为下游模型提取 AST 节点的向量表示；和一个交换网络，它学习一个自适应权重向量来组合不同的代码表示来生成摘要。我们将这些方法集成到一个全面的代码摘要模型中，该模型包括用于标记代码序列的顺序编码器和用于其 AST 的基于树的编码器。我们在大型 Java 数据集上评估我们的模型。实验结果表明，我们的模型在各种指标上优于几个最先进的模型，并且所提出的方法对改进做出了很大贡献。

更新日期：2020-09-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文