Just-in-time defect prediction for software hunks,Software: Practice and Experience

当前位置： X-MOL 学术 › Softw. Pract. Exp. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Just-in-time defect prediction for software hunks
Software: Practice and Experience ( IF 3.5 ) Pub Date : 2021-06-16 , DOI: 10.1002/spe.3001
Xiaoyan Zhu ₁ , Chenyu Yan ₁ , E. James Whitehead ₂ , Binbin Niu ₃ , Lei Zhu ₄ , Long Pan ₅

Affiliation

Just-in-time defect prediction can remind software developers and managers to verify and fix bugs at the moment they appeared, thus improving the effectiveness and validity of bug fixing. Existing studies mainly focus on just-in-time prediction for software files (JIT-F). JIT-F is a binary classification problem, which classifies (hence predicts) a file change as buggy or clean. This article provides a detailed analysis of just-in-time defect prediction for software hunks (JIT-H), which predicts bugs at a finer level of granularity, and hence further improves the efficiency of bug fixing. Classification is performed using the ensemble technique of bagging—aggregated combinations of random under sampling plus multiple classifiers (J48 and Random Forest). An empirical study with 10 open source projects was conducted to validate the effectiveness of JIT-H. Experimental results show that JIT-H is effective at predicting defects in software hunk changes. Compared with JIT-F, JIT-H is more cost effective. Additionally, analysis on the change features indicates that Text Vector features and hunk change level features are of more importance than features in other groups and levels.

中文翻译：

软件大佬的即时缺陷预测

即时缺陷预测可以提醒软件开发人员和管理人员在错误出现的那一刻进行验证和修复，从而提高错误修复的有效性和有效性。现有的研究主要集中在软件文件的即时预测（JIT-F）上。JIT-F 是一个二元分类问题，它将文件更改分类（因此预测）为错误或干净。本文详细分析了软件块的及时缺陷预测（JIT-H），它可以在更细的粒度级别预测错误，从而进一步提高错误修复的效率。使用 Bagging 的集成技术进行分类——随机抽样加上多个分类器（J48 和随机森林）的聚合组合。对 10 个开源项目进行了实证研究，以验证 JIT-H 的有效性。实验结果表明，JIT-H 在预测软件大块变化中的缺陷方面是有效的。与 JIT-F 相比，JIT-H 更具成本效益。此外，对变化特征的分析表明，文本向量特征和大块变化级别特征比其他组和级别的特征更重要。

更新日期：2021-06-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>