On the relationship between bug reports and queries for text retrieval-based bug localization,Empirical Software Engineering

当前位置： X-MOL 学术 › Empir. Software Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On the relationship between bug reports and queries for text retrieval-based bug localization
Empirical Software Engineering ( IF 4.1 ) Pub Date : 2020-07-13 , DOI: 10.1007/s10664-020-09823-w
Chris Mills , Esteban Parra , Jevgenija Pantiuchina , Gabriele Bavota , Sonia Haiduc

As societal dependence on software continues to grow, bugs are becoming increasingly costly in terms of financial resources as well as human safety. Bug localization is the process by which a developer identifies buggy code that needs to be fixed to make a system safer and more reliable. Unfortunately, manually attempting to locate bugs solely from the information in a bug report requires advanced knowledge of how a system is constructed and the way its constituent pieces interact. Therefore, previous work has investigated numerous techniques for reducing the human effort spent in bug localization. One of the most common approaches is Text Retrieval (TR) in which a system’s source code is indexed into a search space that is then queried for code relevant to a given bug report. In the last decade, dozens of papers have proposed improvements to bug localization using TR with largely positive results. However, several other studies have called the technique into question. According to these studies, evaluations of TR-based approaches often lack sufficient controls on biases that artificially inflate the results, namely: misclassified bugs, tangled commits, and localization hints. Here we argue that contemporary evaluations of TR approaches also include a negative bias that outweighs the previously identified positive biases: while TR approaches expect a natural language query, most evaluations simply formulate this query as the full text of a bug report. In this study we show that highly performing queries can be extracted from the bug report text, in order to make TR effective even without the aforementioned positive biases. Further, we analyze the provenance of terms in these highly performing queries to drive future work in automatic query extraction from bug reports.

中文翻译：

基于文本检索的错误定位的错误报告和查询之间的关系

随着社会对软件的依赖不断增长，错误在财务资源和人类安全方面的成本越来越高。错误定位是开发人员识别需要修复的错误代码以使系统更安全、更可靠的过程。不幸的是，仅从错误报告中的信息手动尝试定位错误需要对系统的构建方式及其组成部分交互方式的高级知识。因此，之前的工作已经研究了许多技术来减少在错误定位上花费的人力。最常见的方法之一是文本检索 (TR)，其中系统的源代码被索引到搜索空间中，然后查询与给定错误报告相关的代码。在过去的十年里，数十篇论文提出了使用 TR 来改进 bug 定位，并取得了很大的积极成果。然而，其他几项研究对该技术提出了质疑。根据这些研究，基于 TR 的方法的评估通常缺乏对人为夸大结果的偏见的充分控制，即：错误分类的错误、纠结的提交和本地化提示。在这里，我们认为，当代对 TR 方法的评估还包括一个负面偏见，它超过了先前确定的正面偏见：虽然 TR 方法期望自然语言查询，但大多数评估只是将此查询表述为错误报告的全文。在这项研究中，我们表明可以从错误报告文本中提取高性能查询，以便即使没有上述积极偏见也能使 TR 有效。更多，

更新日期：2020-07-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>