Improving Ranking-Oriented Defect Prediction Using a Cost-Sensitive Ranking SVM,IEEE Transactions on Reliability

当前位置： X-MOL 学术 › IEEE Trans. Reliab. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improving Ranking-Oriented Defect Prediction Using a Cost-Sensitive Ranking SVM
IEEE Transactions on Reliability ( IF 5.0 ) Pub Date : 2020-03-01 , DOI: 10.1109/tr.2019.2931559
Xiao Yu , Jin Liu , Jacky Wai Keung , Qing Li , Kwabena Ebo Bennin , Zhou Xu , Junping Wang , Xiaohui Cui

Context: Ranking-oriented defect prediction (RODP) ranks software modules to allocate limited testing resources to each module according to the predicted number of defects. Most RODP methods overlook that ranking a module with more defects incorrectly makes it difficult to successfully find all of the defects in the module due to fewer testing resources being allocated to the module, which results in much higher costs than incorrectly ranking the modules with fewer defects, and the numbers of defects in software modules are highly imbalanced in defective software datasets. Cost-sensitive learning is an effective technique in handling the cost issue and data imbalance problem for software defect prediction. However, the effectiveness of cost-sensitive learning has not been investigated in RODP models. Aims: In this article, we propose a cost-sensitive ranking support vector machine (SVM) (CSRankSVM) algorithm to improve the performance of RODP models. Method: CSRankSVM modifies the loss function of the ranking SVM algorithm by adding two penalty parameters to address both the cost issue and the data imbalance problem. Additionally, the loss function of the CSRankSVM is optimized using a genetic algorithm. Results: The experimental results for 11 project datasets with 41 releases show that CSRankSVM achieves 1.12%–15.68% higher average fault percentile average (FPA) values than the five existing RODP methods (i.e., decision tree regression, linear regression, Bayesian ridge regression, ranking SVM, and learning-to-rank (LTR)) and 1.08%–15.74% higher average FPA values than the four data imbalance learning methods (i.e., random undersampling and a synthetic minority oversampling technique; two data resampling methods; RankBoost, an ensemble learning method; IRSVM, a CSRankSVM method for information retrieval). Conclusion: CSRankSVM is capable of handling the cost issue and data imbalance problem in RODP methods and achieves better performance. Therefore, CSRankSVM is recommended as an effective method for RODP.

中文翻译：

使用成本敏感的排序 SVM 改进面向排序的缺陷预测

上下文：面向排序的缺陷预测 (RODP) 对软件模块进行排序，以根据预测的缺陷数量为每个模块分配有限的测试资源。大多数 RODP 方法都忽略了错误地对具有更多缺陷的模块进行排序会导致由于分配给模块的测试资源较少而难以成功找到模块中的所有缺陷，这导致成本比错误地对具有较少缺陷的模块进行排序要高得多，并且软件模块中的缺陷数量在有缺陷的软件数据集中高度不平衡。成本敏感学习是处理软件缺陷预测的成本问题和数据不平衡问题的有效技术。然而，尚未在 RODP 模型中研究成本敏感学习的有效性。目标：在本文中，我们提出了一种成本敏感的排名支持向量机 (SVM) (CSRanSVM) 算法来提高 RODP 模型的性能。方法： CSRankSVM 通过添加两个惩罚参数来修改排序 SVM 算法的损失函数，以解决成本问题和数据不平衡问题。此外，使用遗传算法优化 CSRankSVM 的损失函数。结果：针对 41 个版本的 11 个项目数据集的实验结果表明，CSRankSVM 比现有的五种 RODP 方法（即决策树回归、线性回归、贝叶斯岭回归、排序 SVM 和学习排序 (LTR)），平均 FPA 值比四种数据不平衡学习方法高 1.08%–15.74%（即，随机欠采样和合成少数过采样技术；两种数据重采样方法；RankBoost，一种集成学习方法；IRSVM，一种用于信息检索的 CSRankSVM 方法）。结论： CSRankSVM 能够处理 RODP 方法中的成本问题和数据不平衡问题，并取得更好的性能。因此，推荐 CSRankSVM 作为 RODP 的有效方法。

更新日期：2020-03-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11