The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models,IEEE Transactions on Software Engineering

当前位置： X-MOL 学术 › IEEE Trans. Softw. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models
IEEE Transactions on Software Engineering ( IF 6.5 ) Pub Date : 2020-11-01 , DOI: 10.1109/tse.2018.2876537
Chakkrit Tantithamthavorn , Ahmed E. Hassan , Kenichi Matsumoto

Defect models that are trained on class imbalanced datasets (i.e., the proportion of defective and clean modules is not equally represented) are highly susceptible to produce inaccurate prediction models. Prior research compares the impact of class rebalancing techniques on the performance of defect models but arrives at contradictory conclusions due to the use of different choice of datasets, classification techniques, and performance measures. Such contradictory conclusions make it hard to derive practical guidelines for whether class rebalancing techniques should be applied in the context of defect models. In this paper, we investigate the impact of class rebalancing techniques on the performance measures and interpretation of defect models. We also investigate the experimental settings in which class rebalancing techniques are beneficial for defect models. Through a case study of 101 datasets that span across proprietary and open-source systems, we conclude that the impact of class rebalancing techniques on the performance of defect prediction models depends on the used performance measure and the used classification techniques. We observe that the optimized SMOTE technique and the under-sampling technique are beneficial when quality assurance teams wish to increase AUC and Recall, respectively, but they should be avoided when deriving knowledge and understandings from defect models.

中文翻译：

类再平衡技术对缺陷预测模型的性能和解释的影响

在类别不平衡数据集上训练的缺陷模型（即缺陷模块和干净模块的比例不相等）极易产生不准确的预测模型。先前的研究比较了类重新平衡技术对缺陷模型性能的影响，但由于使用了不同的数据集选择、分类技术和性能度量，得出了相互矛盾的结论。这种相互矛盾的结论使得很难得出是否应该在缺陷模型的上下文中应用类再平衡技术的实用指南。在本文中，我们研究了类重新平衡技术对缺陷模型的性能度量和解释的影响。我们还研究了类重新平衡技术对缺陷模型有益的实验设置。通过对跨越专有和开源系统的 101 个数据集的案例研究，我们得出结论，类重新平衡技术对缺陷预测模型性能的影响取决于所使用的性能度量和所使用的分类技术。我们观察到，当质量保证团队希望分别提高 AUC 和召回率时，优化的 SMOTE 技术和欠采样技术是有益的，但在从缺陷模型中获取知识和理解时应避免使用它们。我们得出结论，类重新平衡技术对缺陷预测模型性能的影响取决于使用的性能度量和使用的分类技术。我们观察到，当质量保证团队希望分别提高 AUC 和召回率时，优化的 SMOTE 技术和欠采样技术是有益的，但在从缺陷模型中获取知识和理解时应避免使用它们。我们得出结论，类重新平衡技术对缺陷预测模型性能的影响取决于使用的性能度量和使用的分类技术。我们观察到，当质量保证团队希望分别提高 AUC 和召回率时，优化的 SMOTE 技术和欠采样技术是有益的，但在从缺陷模型中获取知识和理解时应避免使用它们。

更新日期：2020-11-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11