Self-Attention Networks for Code Search,Information and Software Technology

当前位置： X-MOL 学术 › Inf. Softw. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Self-Attention Networks for Code Search
Information and Software Technology ( IF 3.9 ) Pub Date : 2021-02-10 , DOI: 10.1016/j.infsof.2021.106542
Sen Fang , You-Shuai Tan , Tao Zhang , Yepang Liu

Context:

Developers tend to search and reuse code snippets from a large-scale codebase when they want to implement some functions that exist in the previous projects, which can enhance the efficiency of software development.

Objective:

As the first deep learning-based code search model, DeepCS outperforms prior models such as Sourcere and CodeHow. However, it utilizes two separate LSTM to represent code snippets and natural language descriptions respectively, which ignores semantic relations between code snippets and their descriptions. Consequently, the performance of DeepCS falls into the bottleneck, and thus our objective is to break this bottleneck.

Method:

We propose a self-attention joint representation learning model, named SAN-CS (Self-Attention Network for Code Search). Comparing with DeepCS, we directly utilize the self-attention network to construct our code search model. By a weighted average operation, self-attention networks can fully capture the contextual information of code snippets and their descriptions. We first utilize two individual self-attention networks to represent code snippets and their descriptions, respectively, and then we utilize the self-attention network to conduct an extra joint representation network for code snippets and their descriptions, which can build semantic relationships between code snippets and their descriptions. Therefore, SAN-CS can break the performance bottleneck of DeepCS.

Results:

We evaluate SAN-CS on the dataset shared by Gu et al. and choose two baseline models, DeepCS and CARLCS-CNN. Experimental results demonstrate that SAN-CS achieves significantly better performance than DeepCS and CARLCS-CNN. In addition, SAN-CS has better execution efficiency than DeepCS at the training and testing phase.

Conclusion:

This paper proposes a code search model, SAN-CS. It utilizes the self-attention network to perform the joint attention representations for code snippets and their descriptions, respectively. Experimental results verify the effectiveness and efficiency of SAN-CS.

中文翻译：

用于代码搜索的自注意力网络

语境：

当开发人员想要实现以前项目中存在的某些功能时，他们倾向于从大型代码库中搜索和重用代码片段，这可以提高软件开发的效率。

客观的：

作为第一个基于深度学习的代码搜索模型，DeepCS优于Sourcere和CodeHow等先前的模型。但是，它利用两个独立的LSTM分别表示代码段和自然语言描述，而忽略了代码段及其描述之间的语义关系。因此，DeepCS的性能陷入了瓶颈，因此我们的目标是打破这一瓶颈。

方法：

我们提出了一个自我关注联合代表学习模型，命名为SAN-CS（小号elf-一个ttention ñ etwork为Ç颂小号earch）。与DeepCS相比，我们直接利用自我注意网络来构建代码搜索模型。通过加权平均操作，自我注意网络可以完全捕获代码段及其描述的上下文信息。我们首先利用两个单独的自我注意网络分别表示代码段及其描述，然后利用自我注意网络对代码段及其描述进行额外的联合表示网络，这可以在代码段之间建立语义关系。及其描述。因此，SAN-CS可以打破DeepCS的性能瓶颈。