当前位置: X-MOL 学术ACM Trans. Softw. Eng. Methodol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Generating Question Titles for Stack Overflow from Mined Code Snippets
ACM Transactions on Software Engineering and Methodology ( IF 4.4 ) Pub Date : 2020-09-26 , DOI: 10.1145/3401026
Zhipeng Gao 1 , Xin Xia 1 , John Grundy 1 , David Lo 2 , Yuan-Fang Li 1
Affiliation  

Stack Overflow has been heavily used by software developers as a popular way to seek programming-related information from peers via the internet. The Stack Overflow community recommends users to provide the related code snippet when they are creating a question to help others better understand it and offer their help. Previous studies have shown that a significant number of these questions are of low-quality and not attractive to other potential experts in Stack Overflow. These poorly asked questions are less likely to receive useful answers and hinder the overall knowledge generation and sharing process. Considering one of the reasons for introducing low-quality questions in SO is that many developers may not be able to clarify and summarize the key problems behind their presented code snippets due to their lack of knowledge and terminology related to the problem, and/or their poor writing skills, in this study we propose an approach to assist developers in writing high-quality questions by automatically generating question titles for a code snippet using a deep sequence-to-sequence learning approach. Our approach is fully data-driven and uses an attention mechanism to perform better content selection, a copy mechanism to handle the rare-words problem and a coverage mechanism to eliminate word repetition problem. We evaluate our approach on Stack Overflow datasets over a variety of programming languages (e.g., Python, Java, Javascript, C# and SQL) and our experimental results show that our approach significantly outperforms several state-of-the-art baselines in both automatic and human evaluation. We have released our code and datasets to facilitate other researchers to verify their ideas and inspire the follow up work.

中文翻译:

从挖掘的代码片段为堆栈溢出生成问题标题

Stack Overflow 已被软件开发人员大量使用,作为通过互联网从同行那里寻找编程相关信息的流行方式。Stack Overflow 社区建议用户在创建问题时提供相关代码片段,以帮助其他人更好地理解并提供帮助。先前的研究表明,这些问题中有相当一部分质量低下,并且对 Stack Overflow 中的其他潜在专家没有吸引力。这些问得不好的问题不太可能得到有用的答案,并阻碍了整体的知识生成和共享过程。考虑到在 SO 中引入低质量问题的原因之一是,许多开发人员可能由于缺乏与问题相关的知识和术语,和/或他们的糟糕的写作技巧,在这项研究中,我们提出了一种方法来帮助开发人员编写高质量的问题,方法是使用深度序列到序列的学习方法为代码片段自动生成问题标题。我们的方法完全由数据驱动,并使用注意力执行更好的内容选择的机制,复制处理稀有词问题的机制和覆盖范围消除单词重复问题的机制。我们在各种编程语言(例如 Python、Java、Javascript、C# 和 SQL)上评估了我们在 Stack Overflow 数据集上的方法,我们的实验结果表明,我们的方法在自动和人工评价。我们已经发布了我们的代码和数据集,以方便其他研究人员验证他们的想法并激发后续工作。
更新日期:2020-09-26
down
wechat
bug