TreeCaps: Tree-Based Capsule Networks for Source Code Processing,arXiv - CS - Programming Languages

当前位置： X-MOL 学术 › arXiv.cs.PL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

TreeCaps: Tree-Based Capsule Networks for Source Code Processing
arXiv - CS - Programming Languages Pub Date : 2020-09-05 , DOI: arxiv-2009.09777
Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang

Recently program learning techniques have been proposed to process source code based on syntactical structures (e.g., Abstract Syntax Trees) and/or semantic information (e.g., Dependency Graphs). Although graphs may be better at capturing various viewpoints of code semantics than trees, constructing graph inputs from code needs static code semantic analysis that may not be accurate and introduces noise during learning. Although syntax trees are precisely defined according to the language grammar and easier to construct and process than graphs, previous tree-based learning techniques have not been able to learn semantic information from trees to achieve better accuracy than graph-based techniques. We propose a new learning technique, named TreeCaps, by fusing together capsule networks with tree-based convolutional neural networks, to achieve learning accuracy higher than existing graph-based techniques while it is based only on trees. TreeCaps introduces novel variable-to-static routing algorithms into the capsule networks to compensate for the loss of previous routing algorithms. Aside from accuracy, we also find that TreeCaps is the most robust to withstand those semantic-preserving program transformations that change code syntax without modifying the semantics. Evaluated on a large number of Java and C/C++ programs, TreeCaps models outperform prior deep learning models of program source code, in terms of both accuracy and robustness for program comprehension tasks such as code functionality classification and function name prediction

中文翻译：

TreeCaps：用于源代码处理的基于树的胶囊网络

最近已经提出了程序学习技术来处理基于句法结构（例如，抽象语法树）和/或语义信息（例如，依赖图）的源代码。尽管图可能比树更能捕捉代码语义的各种观点，但从代码构建图输入需要静态代码语义分析，这可能不准确并在学习过程中引入噪声。虽然句法树是根据语言语法精确定义的，比图更容易构造和处理，但以前的基于树的学习技术无法从树中学习语义信息以达到比基于图的技术更好的准确性。我们通过将胶囊网络与基于树的卷积神经网络融合在一起，提出了一种名为 TreeCaps 的新学习技术，在仅基于树的情况下，实现比现有基于图的技术更高的学习精度。TreeCaps 在胶囊网络中引入了新颖的可变到静态路由算法，以补偿先前路由算法的损失。除了准确性之外，我们还发现 TreeCaps 是最健壮的，可以承受那些在不修改语义的情况下更改代码语法的语义保留程序转换。对大量 Java 和 C/C++ 程序进行评估，TreeCaps 模型在代码功能分类和函数名称预测等程序理解任务的准确性和鲁棒性方面均优于先前的程序源代码深度学习模型 TreeCaps 在胶囊网络中引入了新颖的可变到静态路由算法，以补偿先前路由算法的损失。除了准确性之外，我们还发现 TreeCaps 是最健壮的，可以承受那些在不修改语义的情况下更改代码语法的语义保留程序转换。对大量 Java 和 C/C++ 程序进行评估，TreeCaps 模型在代码功能分类和函数名称预测等程序理解任务的准确性和鲁棒性方面均优于先前的程序源代码深度学习模型 TreeCaps 在胶囊网络中引入了新颖的可变到静态路由算法，以补偿先前路由算法的损失。除了准确性之外，我们还发现 TreeCaps 是最健壮的，可以承受那些在不修改语义的情况下更改代码语法的语义保留程序转换。对大量 Java 和 C/C++ 程序进行评估，TreeCaps 模型在代码功能分类和函数名称预测等程序理解任务的准确性和鲁棒性方面均优于先前的程序源代码深度学习模型我们还发现 TreeCaps 是最健壮的，可以承受那些改变代码语法而不修改语义的语义保留程序转换。对大量 Java 和 C/C++ 程序进行评估，TreeCaps 模型在代码功能分类和函数名称预测等程序理解任务的准确性和鲁棒性方面均优于先前的程序源代码深度学习模型我们还发现 TreeCaps 是最健壮的，可以承受那些改变代码语法而不修改语义的语义保留程序转换。对大量 Java 和 C/C++ 程序进行评估，TreeCaps 模型在代码功能分类和函数名称预测等程序理解任务的准确性和鲁棒性方面均优于先前的程序源代码深度学习模型

更新日期：2020-09-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文