当前位置: X-MOL 学术IEEE Trans. Softw. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Inconsistent Defect Labels: Essence, Causes, and Influence
IEEE Transactions on Software Engineering ( IF 7.4 ) Pub Date : 2022-03-07 , DOI: 10.1109/tse.2022.3156787
Shiran Liu 1 , Zhaoqiang Guo 1 , Yanhui Li 1 , Chuanqi Wang 1 , Lin Chen 1 , Zhongbin Sun 2 , Yuming Zhou 1 , Baowen Xu 1
Affiliation  

The label quality of defect data sets has a direct influence on the reliability of defect prediction models. In this paper, we conduct a systematic study of inconsistent defect labels in multi-version-project defect data sets, i.e., many instances having the same source code but different labels over multiple versions of a software project. First, we report the phenomena of inconsistent labels by real examples and analyze their essence in the context of defect prediction. Then, we uncover the causes that lead to the occurrence of inconsistent labels for the representative label collection approaches. Finally, we investigate the actual influence of inconsistent labels on defect prediction models. We find that inconsistent labels in general exist in six multi-version-project defect data sets (either widely used or the most up-to-date in the literature) collected by diverse label collection approaches. In particular, inconsistent labels in a training data set significantly reduce the prediction performance of a model, while inconsistent labels in a test data set can lead to a considerable evaluation bias on the real performance. Therefore, we recommend that: on the one hand, researchers leverage our findings to make targeted methodological improvements on existing defect label collection approaches to reduce the generation of inconsistent labels; on the other hand, practitioners detect and exclude inconsistent labels in defect data sets to avoid their potential negative influence on defect prediction.

中文翻译:

不一致的缺陷标签:本质、原因和影响

缺陷数据集的标签质量直接影响缺陷预测模型的可靠性。在本文中,我们对多版本项目缺陷数据集中的不一致缺陷标签进行了系统研究,即在一个软件项目的多个版本中,许多实例具有相同的源代码但标签不同。首先,我们通过实例报告了标签不一致的现象,并在缺陷预测的背景下分析了它们的本质。然后,我们揭示了导致代表性标签收集方法出现标签不一致的原因。最后,我们调查了不一致标签对缺陷预测模型的实际影响。我们发现,通过不同的标签收集方法收集的六个多版本项目缺陷数据集(广泛使用的或文献中最新的)通常存在不一致的标签。特别是,训练数据集中不一致的标签会显着降低模型的预测性能,而测试数据集中不一致的标签会导致对真实性能的相当大的评估偏差。因此,我们建议:一方面,研究人员利用我们的发现对现有的缺陷标签收集方法进行有针对性的方法学改进,以减少不一致标签的产生;另一方面,从业者检测并排除缺陷数据集中不一致的标签,以避免它们对缺陷预测的潜在负面影响。
更新日期:2022-03-07
down
wechat
bug