当前位置: X-MOL 学术Automat. Softw. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Heterogeneous defect prediction with two-stage ensemble learning
Automated Software Engineering ( IF 2.0 ) Pub Date : 2019-06-04 , DOI: 10.1007/s10515-019-00259-1
Zhiqiang Li , Xiao-Yuan Jing , Xiaoke Zhu , Hongyu Zhang , Baowen Xu , Shi Ying

Heterogeneous defect prediction (HDP) refers to predicting defect-prone software modules in one project (target) using heterogeneous data collected from other projects (source). Recently, several HDP methods have been proposed. However, these methods do not sufficiently incorporate the two characteristics of the defect data: (1) data could be linear inseparable, and (2) data could be highly imbalanced. These two data characteristics make it challenging to build an effective HDP model. In this paper, we propose a novel Two-Stage Ensemble Learning (TSEL) approach to HDP, which contains two stages: ensemble multi-kernel domain adaptation (EMDA) stage and ensemble data sampling (EDS) stage. In the EMDA stage, we develop an Ensemble Multiple Kernel Correlation Alignment (EMKCA) predictor, which combines the advantage of multiple kernel learning and domain adaptation techniques. In the EDS stage, we employ RESample with replacement (RES) technique to learn multiple different EMKCA predictors and use average ensemble to combine them together. These two stages create an ensemble of defect predictors. Extensive experiments on 30 public projects show that the proposed TSEL approach outperforms a range of competing methods. The improvement is 20.14–33.92% in AUC, 36.05–54.78% in f-measure, and 5.48–19.93% in balance, respectively.

中文翻译:

两阶段集成学习的异构缺陷预测

异构缺陷预测 (HDP) 是指使用从其他项目(源)收集的异构数据预测一个项目(目标)中容易出现缺陷的软件模块。最近,已经提出了几种 HDP 方法。然而,这些方法并没有充分结合缺陷数据的两个特征:(1)数据可能是线性不可分的,(2)数据可能高度不平衡。这两个数据特征使得构建有效的 HDP 模型具有挑战性。在本文中,我们提出了一种新的 HDP 两阶段集成学习 (TSEL) 方法,它包含两个阶段:集成多内核域适应 (EMDA) 阶段和集成数据采样 (EDS) 阶段。在 EMDA 阶段,我们开发了一个 Ensemble Multiple Kernel Correlation Alignment (EMKCA) 预测器,它结合了多核学习和领域适应技术的优势。在 EDS 阶段,我们使用 RESample with Replacement (RES) 技术来学习多个不同的 EMKCA 预测器,并使用平均集成将它们组合在一起。这两个阶段创建了一组缺陷预测器。对 30 个公共项目的大量实验表明,所提出的 TSEL 方法优于一系列竞争方法。AUC 提高了 20.14-33.92%,f-measure 提高了 36.05-54.78%,平衡提高了 5.48-19.93%。对 30 个公共项目的大量实验表明,所提出的 TSEL 方法优于一系列竞争方法。AUC 提高了 20.14-33.92%,f-measure 提高了 36.05-54.78%,平衡提高了 5.48-19.93%。对 30 个公共项目的大量实验表明,所提出的 TSEL 方法优于一系列竞争方法。AUC 提高了 20.14-33.92%,f-measure 提高了 36.05-54.78%,平衡提高了 5.48-19.93%。
更新日期:2019-06-04
down
wechat
bug