当前位置: X-MOL 学术IEEE Trans. Softw. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Procedure to Continuously Evaluate Predictive Performance of Just-In-Time Software Defect Prediction Models During Software Development
IEEE Transactions on Software Engineering ( IF 6.5 ) Pub Date : 2022-03-15 , DOI: 10.1109/tse.2022.3158831
Liyan Song 1 , Leandro L. Minku 2
Affiliation  

Just-In-Time Software Defect Prediction (JIT-SDP) uses machine learning to predict whether software changes are defect-inducing or clean. When adopting JIT-SDP, changes in the underlying defect generating process may significantly affect the predictive performance of JIT-SDP models over time. Therefore, being able to continuously track the predictive performance of JIT-SDP models during the software development process is of utmost importance for software companies to decide whether or not to trust the predictions provided by such models over time. However, there has been little discussion on how to continuously evaluate predictive performance in practice, and such evaluation is not straightforward. In particular, labeled software changes that can be used for evaluation arrive over time with a delay, which in part corresponds to the time we have to wait to label software changes as ‘clean’ (waiting time). A clean label assigned based on a given waiting time may not correspond to the true label of the software changes. This can potentially hinder the validity of any continuous predictive performance evaluation procedure for JIT-SDP models. This paper provides the first discussion of how to continuously evaluate predictive performance of JIT-SDP models over time during the software development process, and the first investigation of whether and to what extent waiting time affects the validity of such continuous performance evaluation procedure in JIT-SDP. Based on 13 GitHub projects, we found that waiting time had a significant impact on the validity. Though typically small, the differences in estimated predicted performance were sometimes large, and thus inappropriate choices of waiting time can lead to misleading estimations of predictive performance over time. Such impact did not normally change the ranking between JIT-SDP models, and thus conclusions in terms of which JIT-SDP model performs better are likely reliable independent of the choice of waiting time, especially when considered across projects.

中文翻译:


在软件开发过程中持续评估即时软件缺陷预测模型的预测性能的程序



即时软件缺陷预测 (JIT-SDP) 使用机器学习来预测软件变更是否会引发缺陷或干净。当采用 JIT-SDP 时,底层缺陷生成过程的变化可能会随着时间的推移显着影响 JIT-SDP 模型的预测性能。因此,能够在软件开发过程中持续跟踪 JIT-SDP 模型的预测性能对于软件公司决定是否信任此类模型随时间提供的预测至关重要。然而,关于如何在实践中持续评估预测性能的讨论很少,而且这种评估并不简单。特别是,可用于评估的标记软件更改会随着时间的推移而延迟到达,这部分对应于我们必须等待将软件更改标记为“干净”的时间(等待时间)。基于给定等待时间分配的干净标签可能与软件更改的真实标签不对应。这可能会妨碍 JIT-SDP 模型的任何连续预测性能评估程序的有效性。本文首次讨论了如何在软件开发过程中随时间持续评估 JIT-SDP 模型的预测性能,并首次研究等待时间是否以及在多大程度上影响 JIT-SDP 模型中这种连续性能评估程序的有效性。 SDP。基于 13 个 GitHub 项目,我们发现等待时间对有效性有显着影响。虽然通常很小,但估计的预测性能的差异有时很大,因此等待时间的不适当选择可能会导致对随着时间的推移预测性能的误导性估计。 这种影响通常不会改变 JIT-SDP 模型之间的排名,因此 JIT-SDP 模型表现更好的结论可能是可靠的,与等待时间的选择无关,特别是在跨项目考虑时。
更新日期:2022-03-15
down
wechat
bug