当前位置: X-MOL 学术J. Syst. Softw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A hybrid code representation learning approach for predicting method names
Journal of Systems and Software ( IF 3.5 ) Pub Date : 2021-05-27 , DOI: 10.1016/j.jss.2021.111011
Fengyi Zhang , Bihuan Chen , Rongfan Li , Xin Peng

Program semantic properties such as class names, method names, and variable names and types play an important role in software development and maintenance. Method names are of particular importance because they provide the cornerstone of abstraction for developers to communicate with each other for various purposes (e.g., code review and program comprehension). Existing method name prediction approaches often represent code as lexical tokens or syntactical AST (abstract syntax tree) paths, making them difficult to learn code semantics and hindering their effectiveness in predicting method names. Initial attempts have been made to represent code as execution traces to capture code semantics, but suffer scalability in collecting execution traces.

In this paper, we propose a hybrid code representation learning approach, named Meth2Seq, to encode a method as a sequence of distributed vectors. Meth2Seq represents a method as (1) a bag of paths on the program dependence graph, (2) a sequence of typed intermediate representation statements and (3) a sentence of natural language comment, to scalably capture code semantics. The learned sequence of vectors of a method is fed to a decoder model to predict method names. Our evaluation with a dataset of 280.5K methods in 67 Java projects has demonstrated that Meth2Seq outperforms the two state-of-the-art code representation learning approaches in F1-score by 92.6% and 36.6%, while also outperforming two state-of-the-art method name prediction approaches in F1-score by 85.6% and 178.1%.



中文翻译:

一种用于预测方法名称的混合代码表示学习方法

类名、方法名、变量名和类型等程序语义属性在软件开发和维护中起着重要作用。方法名称特别重要,因为它们为开发人员为各种目的(例如,代码审查和程序理解)相互通信提供了抽象的基石。现有的方法名称预测方法通常将代码表示为词法标记或句法 AST(抽象语法树)路径,这使得它们难以学习代码语义并阻碍它们预测方法名称的有效性。最初尝试将代码表示为执行跟踪以捕获代码语义,但在收集执行跟踪方面存在可扩展性。

在本文中,我们提出了一种名为Meth2Seq的混合代码表示学习方法,将方法编码为分布式向量序列。Meth2Seq将一种方法表示为 (1) 程序依赖图上的一包路径,(2) 一系列类型化的中间表示语句和 (3) 自然语言注释的句子,以可扩展地捕获代码语义。方法的学习向量序列被馈送到解码器模型以预测方法名称。我们对 67 个 Java 项目中 280.5K 方法的数据集的评估表明,Meth2Seq 在 F1-score 中优于两种最先进的代码表示学习方法 92.6% 和 36.6%,同时在 F1-score 中也优于两种最先进的方法名称预测方法 85.6% 和 178.1% .

更新日期:2021-06-02
down
wechat
bug