Do different cross‐project defect prediction methods identify the same defective modules?,Journal of Software: Evolution and Process

当前位置： X-MOL 学术 › J. Softw. Evol. Process › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Do different cross‐project defect prediction methods identify the same defective modules?
Journal of Software: Evolution and Process ( IF 1.7 ) Pub Date : 2019-10-31 , DOI: 10.1002/smr.2234
Xiang Chen _{1,

2,

3} , Yanzhou Mu _{1,

4} , Yubin Qu ₅ , Chao Ni ₂ , Meng Liu ₁ , Tong He ₁ , Shangqing Liu ₃

Affiliation

Cross‐project defect prediction (CPDP) is needed when the target projects are new projects or the projects have less training data, since these projects do not have sufficient historical data to build high‐quality prediction models. The researchers have proposed many CPDP methods, and previous studies have conducted extensive comparisons on the performance of different CPDP methods. However, to the best of our knowledge, it remains unclear whether different CPDP methods can identify the same defective modules, and this issue has not been thoroughly explored. In this article, we select 12 state‐of‐the‐art CPDP methods, including eight supervised methods and four unsupervised methods. We first compare the performance of these methods in the same experiment settings on five widely used datasets (ie, NASA, SOFTLAB, PROMISE, AEEEM, and ReLink) and rank these methods via the Scott‐Knott test. Final results confirm the competitiveness of unsupervised methods. Then we perform diversity analysis on defective modules for these methods by using the McNemar test. Empirical results verify that different CPDP methods may lead to difference in the modules predicted as defective, especially when the comparison is performed between the supervised methods and unsupervised methods. Finally, we also find there exist a certain number of defective modules, which cannot be correctly identified by any of the CPDP methods or can be correctly identified by only one CPDP method. These findings can be utilized to design more effective methods to further improve the performance of CPDP.

中文翻译：

不同的跨项目缺陷预测方法是否可以识别相同的缺陷模块？

当目标项目是新项目或项目的训练数据较少时，需要跨项目缺陷预测（CPDP），因为这些项目没有足够的历史数据来构建高质量的预测模型。研究人员提出了许多CPDP方法，之前的研究对不同CPDP方法的性能进行了广泛的比较。然而，据我们所知，不同的 CPDP 方法是否可以识别相同的缺陷模块仍不清楚，这个问题还没有得到彻底的探讨。在本文中，我们选择了 12 种最先进的 CPDP 方法，包括八种监督方法和四种无监督方法。我们首先在五个广泛使用的数据集（即 NASA、SOFTLAB、PROMISE、AEEEM、和 ReLink）并通过 Scott-Knott 检验对这些方法进行排名。最终结果证实了无监督方法的竞争力。然后我们通过使用 McNemar 测试对这些方法的缺陷模块进行多样性分析。实证结果证实，不同的 CPDP 方法可能会导致预测为有缺陷的模块的差异，尤其是在有监督方法和无监督方法之间进行比较时。最后，我们还发现存在一定数量的缺陷模块，它们不能被任何一种 CPDP 方法正确识别，或者只能被一种 CPDP 方法正确识别。这些发现可用于设计更有效的方法，以进一步提高 CPDP 的性能。然后我们通过使用 McNemar 测试对这些方法的缺陷模块进行多样性分析。实证结果证实，不同的 CPDP 方法可能会导致预测为有缺陷的模块的差异，尤其是在有监督方法和无监督方法之间进行比较时。最后，我们还发现存在一定数量的缺陷模块，它们不能被任何一种 CPDP 方法正确识别，或者只能被一种 CPDP 方法正确识别。这些发现可用于设计更有效的方法，以进一步提高 CPDP 的性能。然后我们通过使用 McNemar 测试对这些方法的缺陷模块进行多样性分析。实证结果证实，不同的 CPDP 方法可能会导致预测为有缺陷的模块的差异，尤其是在有监督方法和无监督方法之间进行比较时。最后，我们还发现存在一定数量的缺陷模块，它们不能被任何一种 CPDP 方法正确识别，或者只能被一种 CPDP 方法正确识别。这些发现可用于设计更有效的方法，以进一步提高 CPDP 的性能。特别是在有监督方法和无监督方法之间进行比较时。最后，我们还发现存在一定数量的缺陷模块，它们不能被任何一种 CPDP 方法正确识别，或者只能被一种 CPDP 方法正确识别。这些发现可用于设计更有效的方法，以进一步提高 CPDP 的性能。特别是在有监督方法和无监督方法之间进行比较时。最后，我们还发现存在一定数量的缺陷模块，它们不能被任何一种 CPDP 方法正确识别，或者只能被一种 CPDP 方法正确识别。这些发现可用于设计更有效的方法，以进一步提高 CPDP 的性能。

更新日期：2019-10-31

点击分享查看原文

点击收藏

阅读更多本刊最新论文