Leveraging developer information for efficient effort-aware bug prediction,Information and Software Technology

当前位置： X-MOL 学术 › Inf. Softw. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Leveraging developer information for efficient effort-aware bug prediction
Information and Software Technology ( IF 3.8 ) Pub Date : 2021-04-26 , DOI: 10.1016/j.infsof.2021.106605
Yu Qu , Jianlei Chi , Heng Yin

Context:

Software bug prediction techniques can provide informative guidance in software engineering practices. Over the past 15 years, developer information has been intensively used in bug prediction as features or basic data source to construct other useful models.

Objective:

Further leverage developer information from a new and straightforward perspective to improve effort-aware bug prediction.

Methods:

We propose to investigate the direct relations between the number of developers and the probability for a file to be buggy. Based on an empirical study on nine open-source Java systems with 32 versions, we observe a widely-existed and interesting tendency: when there are more developers working on a source file, there will be a stronger possibility for this file to be buggy. Based on the observed tendency, we propose an unsupervised algorithm and a supervised equation both called top-dev to improve effort-aware bug prediction. The key idea is to prioritize the ranking of files, whose number of developers is large, in the suspicious file list generated by effort-aware models.

Results:

Experimental results show that the proposed top-dev algorithm and equation significantly outperform the unsupervised and supervised baseline models (ManualUp, $R_{a d}$ , $R_{d d}$ , $R_{e e}$ , CBS+, and top-core). Moreover, the unsupervised top-dev algorithm is comparable or superior to existing supervised baseline models.

Conclusion:

The proposed approaches are very useful in effort-aware bug prediction practices. Practitioners can use the top-dev algorithm to generate a high-quality and informative suspicious file list without training complex machine learning classifiers. On the other hand, when building supervised bug prediction model, the best practice is to combine existing models with the top-dev equation.

中文翻译：

利用开发人员信息进行有效的省力的错误预测

语境：

软件错误预测技术可以在软件工程实践中提供有益的指导。在过去的15年中，开发人员信息已被广泛用于错误预测中，以作为功能或基本数据源来构建其他有用的模型。

客观的：

从新的和直接的角度进一步利用开发人员信息，以改进可识别工作量的错误预测。

方法：

我们建议调查开发人员数量与文件出现错误的可能性之间的直接关系。基于对9个具有32个版本的开源Java系统的实证研究，我们观察到了一种广泛存在且有趣的趋势：当有更多开发人员在处理源文件时，该文件出现bug的可能性就更大。根据观察到的趋势，我们提出了一种无监督算法和一个受监督方程，两者都称为top-dev，以提高可感知工作量的错误预测。关键思想是优先考虑由努力感知模型生成的可疑文件列表中的文件数量，该文件的开发人员数量很多。