当前位置: X-MOL 学术Arab. J. Sci. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Improved Method for Training Data Selection for Cross-Project Defect Prediction
Arabian Journal for Science and Engineering ( IF 2.9 ) Pub Date : 2021-09-04 , DOI: 10.1007/s13369-021-06088-3
Nayeem Ahmad Bhat 1 , Sheikh Umar Farooq 1
Affiliation  

The selection of relevant training data significantly improves the quality of cross-project defect prediction (CPDP) process. We propose a training data selection approach and compare its performance against the Burak filter and the Peter filter over Bug Prediction Dataset. In our approach (BurakMHD), firstly a data transformation is applied to the datasets. Then, individual instances of the target project adds k-instances at a minimum Hamming distance each from the transformed multi-source defective and non-defective data instances to the filtered training dataset (filtered TDS). Compared to using all the cross-project data, the false positive rate decreases by 10.6% associated with a 2.6% decrease in defect detection rate. The overall performance nMCC, Balance, G-measure increase by 2.9%, 5.7%, 6.6%, respectively. Compared to Burak filter and Peter filter, defect detection rate increases by 1.5% and 1.8%, respectively, and the false positive rate decreases by 6.4%. The overall performance nMCC, Balance, G-measure increase by 3%, 5.3%, 6.8% and by 3.2%, 5.5%, 7.1% compared to Burak and Peter filter, respectively. Compared to within-project predictions, the overall performance nMCC, Balance, G-measure increase by 1.1%, 3.4%, 4%, respectively, and the defect detection rate and false positive rate decrease by 9.2% and 13.1%, respectively. In general, our approach improved the performance significantly, compared to the Burak filter, Peter filter, cross-project prediction, and within-project prediction. Therefore, we conclude, applying data transformation and filtering training data separately from the defective and non-defective instances of cross-project data is helpful to select the relevant data for CPDP.



中文翻译:

一种改进的跨项目缺陷预测训练数据选择方法

相关训练数据的选择显着提高了跨项目缺陷预测(CPDP)过程的质量。我们提出了一种训练数据选择方法,并将其性能与 Bug 预测数据集上的 Burak 过滤器和 Peter 过滤器进行比较。在我们的方法 (BurakMHD) 中,首先对数据集应用数据转换。然后,目标项目的各个实例以最小汉明距离添加 k 个实例,每个实例从转换后的多源缺陷和非缺陷数据实例到过滤的训练数据集(过滤的 TDS)。与使用所有跨项目数据相比,误报率降低了 10.6%,缺陷检测率降低了 2.6%。整体性能 nMCC、Balance、G-measure 分别提高了 2.9%、5.7%、6.6%。与 Burak filter 和 Peter filter 相比,缺陷检测率分别提高了 1.5% 和 1.8%,误报率降低了 6.4%。与 Burak 和 Peter 滤波器相比,整体性能 nMCC、Balance、G-measure 分别提高了 3%、5.3%、6.8% 和 3.2%、5.5%、7.1%。与项目内预测相比,整体性能 nMCC、Balance、G-measure 分别提高了 1.1%、3.4%、4%,缺陷检测率和误报率分别下降了 9.2% 和 13.1%。总的来说,与 Burak 过滤器、Peter 过滤器、跨项目预测和项目内预测相比,我们的方法显着提高了性能。因此,我们得出结论,

更新日期:2021-09-04
down
wechat
bug