GloBug: Using global data in Fault Localization,Journal of Systems and Software

当前位置： X-MOL 学术 › J. Syst. Softw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

GloBug: Using global data in Fault Localization
Journal of Systems and Software ( IF 3.7 ) Pub Date : 2021-03-29 , DOI: 10.1016/j.jss.2021.110961
Nima Miryeganeh , Sepehr Hashtroudi , Hadi Hemmati

Fault Localization (FL) is an important first step in software debugging and is mostly manual in the current practice. Many methods have been proposed over years to automate the FL process, including information retrieval (IR)-based techniques. These methods localize the fault based on the similarity of the reported bug report and the source code. Newer variations of IR-based FL (IRFL) techniques also look into the history of bug reports and leverage them during the localization. However, all existing IRFL techniques limit themselves to the current project’s data (local data). In this study, we introduce $G l o b u g$ , which is an IRFL framework consisting of methods that use models pre-trained on the global data (extracted from open-source benchmark projects). In $G l o b u g$ , we investigate two heuristics: (a) the effect of global data on a state-of-the-art IR-FL technique, namely $B u g L o c a t o r$ , and (b) the application of a Word Embedding technique (Doc2Vec) together with global data. Our large scale experiment on 51 software projects shows that using global data improves $B u g L o c a t o r$ on average 6.6% and 4.8% in terms of MRR (Mean Reciprocal Rank) and MAP (Mean Average Precision), with over 14% in a majority (64% and 54% in terms of MRR and MAP, respectively) of the cases. This amount of improvement is significant compared to the improvement rates that five other state-of-the-art IRFL tools provide over $B u g L o c a t o r$ . In addition, training the models globally is a one-time offline task with no overhead on $B u g L o c a t o r$ ’s run-time fault localization. Our study, however, shows that a Word Embedding-based global solution did not further improve the results.

中文翻译：

GloBug：在故障定位中使用全局数据

故障定位（FL）是软件调试中重要的第一步，并且在当前实践中通常是手动的。多年来，已经提出了许多方法来自动化FL过程，包括基于信息检索（IR）的技术。这些方法基于所报告的错误报告和源代码的相似性来定位故障。基于IR的FL（IRFL）技术的更新版本也可以查看错误报告的历史记录，并在本地化过程中加以利用。但是，所有现有的IRFL技术都将其自身限制为当前项目的数据（本地数据）。在这项研究中，我们介绍 $G 升 Ø b ü G$ ，这是一个IRFL框架，由使用在全球数据上预先训练的模型（从开源基准项目中提取）的方法组成。在 $G 升 Ø b ü G$ ，我们研究了两种启发式方法：（a）全局数据对最新IR-FL技术的影响，即 $乙 ü G 大号 Ø C 一种 Ť Ø [R$ ，以及（b）将字嵌入技术（Doc2Vec）与全局数据一起应用。我们对51个软件项目进行的大规模实验表明，使用全局数据可以改善 $乙 ü G 大号 Ø C 一种 Ť Ø [R$ 在MRR（平均倒数排名）和MAP（平均平均精度）方面平均分别为6.6％和4.8％，其中大多数案例超过14％（在MRR和MAP方面分别为64％和54％）。与其他五种最新的IRFL工具提供的改进率相比，此改进量是显着的 $乙 ü G 大号 Ø C 一种 Ť Ø [R$ 。此外，在全球范围内训练模型是一项一次性的离线任务，无需任何额外费用 $乙 ü G 大号 Ø C 一种 Ť Ø [R$ 的运行时故障本地化。但是，我们的研究表明，基于词嵌入的全局解决方案并不能进一步改善结果。

更新日期：2021-04-11

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11