Learning Autocompletion from Real-World Datasets,arXiv - CS - Software Engineering

当前位置： X-MOL 学术 › arXiv.cs.SE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning Autocompletion from Real-World Datasets
arXiv - CS - Software Engineering Pub Date : 2020-11-09 , DOI: arxiv-2011.04542
Gareth Ari Aye, Seohyun Kim, Hongyu Li

Code completion is a popular software development tool integrated into all major IDEs. Many neural language models have achieved promising results in completion suggestion prediction on synthetic benchmarks. However, a recent study When Code Completion Fails: a Case Study on Real-World Completions demonstrates that these results may not translate to improvements in real-world performance. To combat this effect, we train models on real-world code completion examples and find that these models outperform models trained on committed source code and working version snapshots by 12.8% and 13.8% accuracy respectively. We observe this improvement across modeling technologies and show through A/B testing that it corresponds to a 6.2% increase in programmers' actual autocompletion usage. Furthermore, our study characterizes a large corpus of logged autocompletion usages to investigate why training on real-world examples leads to stronger models.

中文翻译：

从真实世界的数据集中学习自动完成

代码完成是集成到所有主要 IDE 中的流行软件开发工具。许多神经语言模型在合成基准的完成建议预测方面取得了可喜的成果。然而，最近的一项研究当代码完成失败时：真实世界完成案例研究表明，这些结果可能不会转化为现实世界性能的改进。为了消除这种影响，我们在现实世界的代码完成示例上训练模型，发现这些模型的准确率分别比在提交源代码和工作版本快照上训练的模型高 12.8% 和 13.8%。我们观察到跨建模技术的这种改进，并通过 A/B 测试表明它对应于程序员实际自动完成使用量增加了 6.2%。此外，

更新日期：2020-11-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文