Code Search Intent Classification Using Weak Supervision,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Code Search Intent Classification Using Weak Supervision
arXiv - CS - Artificial Intelligence Pub Date : 2020-11-24 , DOI: arxiv-2011.11950
Nikitha Rao, Chetan Bansal, Joe Guan

Developers use search for various tasks such as finding code, documentation, debugging information, etc. In particular, web search is heavily used by developers for finding code examples and snippets during the coding process. Recently, natural language based code search has been an active area of research. However, the lack of real-world large-scale datasets is a significant bottleneck. In this work, we propose a weak supervision based approach for detecting code search intent in search queries for C\# and Java programming languages. We evaluate the approach against several baselines on a real-world dataset comprised of over 1 million queries mined from Bing web search engine and show that the CNN based model can achieve an accuracy of 77% and 76% for C# and Java respectively. Furthermore, we are also releasing the first large-scale real-world dataset of code search queries mined from Bing web search engine. We hope that the dataset will aid future research on code search.

中文翻译：

使用弱监督的代码搜索意图分类

开发人员使用搜索来查找各种任务，例如查找代码，文档，调试信息等。特别地，开发人员大量使用Web搜索来在编码过程中查找代码示例和代码片段。最近，基于自然语言的代码搜索已成为研究的活跃领域。但是，缺乏现实世界的大规模数据集是一个严重的瓶颈。在这项工作中，我们提出了一种基于弱监督的方法来检测C \＃和Java编程语言的搜索查询中的代码搜索意图。我们在一个由Bing Web搜索引擎提取的超过一百万个查询的真实数据集上的几个基准上评估了该方法，并表明基于CNN的模型对于C＃和Java可以分别达到77％和76％的准确性。此外，我们还将发布从Bing网络搜索引擎中提取的第一个大规模的代码搜索查询的真实世界数据集。我们希望数据集将有助于代码搜索的未来研究。

更新日期：2020-11-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文