A Deep Learning Approach for a Source Code Detection Model Using Self-Attention,Complexity

当前位置： X-MOL 学术 › Complexity › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Deep Learning Approach for a Source Code Detection Model Using Self-Attention
Complexity ( IF 2.3 ) Pub Date : 2020-09-16 , DOI: 10.1155/2020/5027198
Yao Meng ₁ , Long Liu ₁

Affiliation

With the development of deep learning, many approaches based on neural networks are proposed for code clone. In this paper, we propose a novel source code detection model At-biLSTM based on a bidirectional LSTM network with a self-attention layer. At-biLSTM is composed of a representation model and a discriminative model. The representation model firstly transforms the source code into an abstract syntactic tree and splits it into a sequence of statement trees; then, it encodes each of the statement trees with a deep-first traversal algorithm. Finally, the representation model encodes the sequence of statement vectors via a bidirectional LSTM network, which is a classical deep learning framework, with a self-attention layer and outputs a vector representing the given source code. The discriminative model identifies the code clone depending on the vectors generated by the presentation model. Our proposed model retains both the syntactics and semantics of the source code in the process of encoding, and the self-attention algorithm makes the classifier concentrate on the effect of key statements and improves the classification performance. The contrast experiments on the benchmarks OJClone and BigCloneBench indicate that At-LSTM is effective and outperforms the state-of-art approaches in source code clone detection.

中文翻译：

一种基于自我注意的源代码检测模型的深度学习方法

随着深度学习的发展，提出了许多基于神经网络的代码克隆方法。在本文中，我们提出了一种基于带有自注意层的双向LSTM网络的新型源代码检测模型At-biLSTM。At-biLSTM由一个表示模型和一个判别模型组成。表示模型首先将源代码转换为抽象语法树，然后将其拆分为语句树序列。然后，它使用深度优先遍历算法对每个语句树进行编码。最后，表示模型通过双向LSTM网络（一个经典的深度学习框架）使用自注意层对语句向量的序列进行编码，并输出代表给定源代码的向量。区分模型根据表示模型生成的向量来标识代码克隆。我们提出的模型在编码过程中既保留了源代码的句法和语义，又利用自注意算法使分类器专注于关键语句的效果，提高了分类性能。在基准OJClone和BigCloneBench上进行的对比实验表明，At-LSTM是有效的，并且在源代码克隆检测中优于最新方法。

更新日期：2020-09-16

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>