Code Clone Detection with Hierarchical Attentive Graph Embedding,International Journal of Software Engineering and Knowledge Engineering

当前位置： X-MOL 学术 › Int. J. Softw. Eng. Knowl. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Code Clone Detection with Hierarchical Attentive Graph Embedding
International Journal of Software Engineering and Knowledge Engineering ( IF 0.6 ) Pub Date : 2021-06-21 , DOI: 10.1142/s021819402150025x
Xiujuan Ji ₁ , Lei Liu ₂ , Jingwen Zhu ₃

Affiliation

Code clone serves as a typical programming manner that reuses the existing code to solve similar programming problems, which greatly facilitates software development but recurs program bugs and maintenance costs. Recently, deep learning-based detection approaches gradually present their effectiveness on feature representation and detection performance. Among them, deep learning approaches based on abstract syntax tree (AST) construct models relying on the node embedding technique. In AST, the semantic of nodes is obviously hierarchical, and the importance of nodes is quite different to determine whether the two code fragments are cloned or not. However, some approaches do not fully consider the hierarchical structure information of source code. Some approaches ignore the different importance of nodes when generating the features of source code. Thirdly, when the tree is very large and deep, many approaches are vulnerable to the gradient vanishing problem during training. In order to properly address these challenges, we propose a hierarchical attentive graph neural network embedding model-HAG for the code clone detection. Firstly, the attention mechanism is applied on nodes in AST to distinguish the importance of different nodes during the model training. In addition, the HAG adopts graph convolutional network (GCN) to propagate the code message on AST graph and then exploits a hierarchical differential pooling GCN to sufficiently capture the code semantics at different structure level. To evaluate the effectiveness of HAG, we conducted extensive experiments on public clone dataset and compared it with seven state-of-the-art clone detection models. The experimental results demonstrate that the HAG achieves superior detection performance compared with baseline models. Especially, in the detection of moderately Type-3 or Type-4 clones, the HAG particularly outperforms baselines, indicating the strong detection capability of HAG for semantic clones. Apart from that, the impacts of the hierarchical pooling, attention mechanism and critical model parameters are systematically discussed.

中文翻译：

使用分层注意力图嵌入的代码克隆检测

代码克隆作为一种典型的编程方式，可以重用已有的代码来解决类似的编程问题，极大地方便了软件开发，但又会产生程序错误和维护成本。最近，基于深度学习的检测方法逐渐展示了它们在特征表示和检测性能方面的有效性。其中，基于抽象语法树（AST）的深度学习方法依赖于节点嵌入技术构建模型。在 AST 中，节点的语义显然是分层的，节点的重要性对于判断两个代码片段是否被克隆有很大的不同。然而，一些方法没有充分考虑源代码的层次结构信息。一些方法在生成源代码的特征时忽略了节点的不同重要性。第三，当树很大很深时，许多方法在训练过程中容易受到梯度消失问题的影响。为了正确应对这些挑战，我们提出了一种分层注意力图神经网络嵌入模型-HAG，用于代码克隆检测。首先，在 AST 中的节点上应用注意力机制，以区分不同节点在模型训练过程中的重要性。此外，HAG 采用图卷积网络（GCN）在 AST 图上传播代码消息，然后利用分层差分池 GCN 充分捕获不同结构级别的代码语义。为了评估 HAG 的有效性，我们对公共克隆数据集进行了广泛的实验，并将其与七种最先进的克隆检测模型进行了比较。实验结果表明，与基线模型相比，HAG 实现了优越的检测性能。特别是在检测适度的 Type-3 或 Type-4 克隆时，HAG 尤其优于基线，表明 HAG 对语义克隆的检测能力很强。除此之外，系统地讨论了分层池、注意力机制和关键模型参数的影响。

更新日期：2021-06-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11