当前位置: X-MOL 学术ACM Trans. Softw. Eng. Methodol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Modular Tree Network for Source Code Representation Learning
ACM Transactions on Software Engineering and Methodology ( IF 4.4 ) Pub Date : 2020-09-26 , DOI: 10.1145/3409331
Wenhan Wang 1 , Ge Li 1 , Sijie Shen 1 , Xin Xia 2 , Zhi Jin 1
Affiliation  

Learning representation for source code is a foundation of many program analysis tasks. In recent years, neural networks have already shown success in this area, but most existing models did not make full use of the unique structural information of programs. Although abstract syntax tree (AST)-based neural models can handle the tree structure in the source code, they cannot capture the richness of different types of substructure in programs. In this article, we propose a modular tree network that dynamically composes different neural network units into tree structures based on the input AST. Different from previous tree-structural neural network models, a modular tree network can capture the semantic differences between types of AST substructures. We evaluate our model on two tasks: program classification and code clone detection. Our model achieves the best performance compared with state-of-the-art approaches in both tasks, showing the advantage of leveraging more elaborate structure information of the source code.

中文翻译:

用于源代码表示学习的模块化树网络

源代码的学习表示是许多程序分析任务的基础。近年来,神经网络已经在这方面取得了成功,但大多数现有模型并没有充分利用程序的独特结构信息。尽管基于抽象语法树 (AST) 的神经模型可以处理源代码中的树结构,但它们无法捕捉程序中不同类型子结构的丰富性。在本文中,我们提出了一种模块化树网络,该网络根据输入的 AST 动态地将不同的神经网络单元组合成树结构。与以前的树结构神经网络模型不同,模块化树网络可以捕获 AST 子结构类型之间的语义差异。我们在两个任务上评估我们的模型:程序分类和代码克隆检测。
更新日期:2020-09-26
down
wechat
bug