Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding,IEEE Transactions on Software Engineering

当前位置： X-MOL 学术 › IEEE Trans. Softw. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding
IEEE Transactions on Software Engineering ( IF 6.5 ) Pub Date : 2020-10-01 , DOI: 10.1109/tse.2018.2876006
Xiaochen Li , He Jiang , Yasutaka Kamei , Xin Chen

Developers increasingly rely on text matching tools to analyze the relation between natural language words and APIs. However, semantic gaps, namely textual mismatches between words and APIs, negatively affect these tools. Previous studies have transformed words or APIs into low-dimensional vectors for matching; however, inaccurate results were obtained due to the failure of modeling words and APIs simultaneously. To resolve this problem, two main challenges are to be addressed: the acquisition of massive words and APIs for mining and the alignment of words and APIs for modeling. Therefore, this study proposes Word2API to effectively estimate relatedness of words and APIs. Word2API collects millions of commonly used words and APIs from code repositories to address the acquisition challenge. Then, a shuffling strategy is used to transform related words and APIs into tuples to address the alignment challenge. Using these tuples, Word2API models words and APIs simultaneously. Word2API outperforms baselines by 10-49.6 percent of relatedness estimation in terms of precision and NDCG. Word2API is also effective on solving typical software tasks, e.g., query expansion and API documents linking. A simple system with Word2API-expanded queries recommends up to 21.4 percent more related APIs for developers. Meanwhile, Word2API improves comparison algorithms by 7.9-17.4 percent in linking questions in Question&Answer communities to API documents.

中文翻译：

使用 Word Embedding 弥合自然语言和 API 之间的语义鸿沟

开发人员越来越依赖文本匹配工具来分析自然语言单词和 API 之间的关系。然而，语义差距，即单词和 API 之间的文本不匹配，会对这些工具产生负面影响。之前的研究已经将词或API转化为低维向量进行匹配；然而，由于同时建模词和 API 的失败，得到了不准确的结果。为了解决这个问题，需要解决两个主要挑战：用于挖掘的海量词和API的获取以及用于建模的词和API的对齐。因此，本研究提出 Word2API 来有效估计单词和 API 的相关性。Word2API 从代码库中收集数百万个常用词和 API 来解决获取挑战。然后，改组策略用于将相关单词和 API 转换为元组以解决对齐挑战。使用这些元组，Word2API 同时为单词和 API 建模。Word2API 在精度和 NDCG 方面的相关性估计比基线高 10-49.6%。Word2API 还可以有效解决典型的软件任务，例如查询扩展和 API 文档链接。带有 Word2API 扩展查询的简单系统为开发人员推荐最多 21.4% 的相关 API。同时，Word2API 在将问答社区中的问题链接到 API 文档方面将比较算法提高了 7.9-17.4%。6% 的相关性估计在精度和 NDCG 方面。Word2API 还可以有效解决典型的软件任务，例如查询扩展和 API 文档链接。带有 Word2API 扩展查询的简单系统为开发人员推荐最多 21.4% 的相关 API。同时，Word2API 在将问答社区中的问题链接到 API 文档方面将比较算法提高了 7.9-17.4%。6% 的相关性估计在精度和 NDCG 方面。Word2API 还可以有效解决典型的软件任务，例如查询扩展和 API 文档链接。带有 Word2API 扩展查询的简单系统为开发人员推荐最多 21.4% 的相关 API。同时，Word2API 在将问答社区中的问题链接到 API 文档方面将比较算法提高了 7.9-17.4%。

更新日期：2020-10-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11