A Novel Class-Imbalance Learning Approach for Both Within-Project and Cross-Project Defect Prediction,IEEE Transactions on Reliability

当前位置： X-MOL 学术 › IEEE Trans. Reliab. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Novel Class-Imbalance Learning Approach for Both Within-Project and Cross-Project Defect Prediction
IEEE Transactions on Reliability ( IF 5.0 ) Pub Date : 2020-03-01 , DOI: 10.1109/tr.2019.2895462
Lina Gong , Shujuan Jiang , Lili Bo , Li Jiang , Junyan Qian

Software defect prediction (SDP) is an available way to enhance test efficiency and guarantee software reliability. However, there are more clean instances than defective instances in real software projects, and this results in severe class distribution skews and gets the poor performance of classifiers. So solving the class-imbalance problem in SDP has attracted growing attention from industry and academia in software engineering. In this paper, we propose a novel class-imbalance learning approach for both within-project and cross-project class-imbalance problem. We utilize the thought of stratification embedded in nearest neighbor (STr-NN) to produce evolving training datasets with balanced data. For within-project, we directly employ the STr-NN approach for defect prediction. For cross-project, we first introduce transfer component analysis to mitigate the distribution differences between source and target dataset, and then employ the STr-NN approach on the transferred data. We conduct experiments on PROMISE and NASA datasets using ensemble learning based on weight vote. Experimental results indicate that our approach has higher area under curve (AUC), Recall and comparable probability of a false alarm (pf), and F-measure than some existing methods for the class-imbalance problem.

中文翻译：

一种用于项目内和跨项目缺陷预测的新型类不平衡学习方法

软件缺陷预测（SDP）是提高测试效率和保证软件可靠性的一种可用方法。然而，在实际的软件项目中，干净的实例比有缺陷的实例多，这会导致严重的类分布偏斜并导致分类器性能不佳。因此解决SDP中的类不平衡问题越来越引起软件工程界和学术界的关注。在本文中，我们为项目内和跨项目的类不平衡问题提出了一种新的类不平衡学习方法。我们利用嵌入最近邻（STr-NN）的分层思想来生成具有平衡数据的不断发展的训练数据集。对于项目内，我们直接采用 STr-NN 方法进行缺陷预测。对于跨项目，我们首先引入传输分量分析来减轻源数据集和目标数据集之间的分布差异，然后对传输的数据采用 STr-NN 方法。我们使用基于权重投票的集成学习对 PROMISE 和 NASA 数据集进行实验。实验结果表明，我们的方法比针对类别不平衡问题的一些现有方法具有更高的曲线下面积 (AUC)、召回率和可比较的虚警概率 (pf) 以及 F 度量。

更新日期：2020-03-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11