当前位置: X-MOL 学术Neural Process Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cost-sensitive Dictionary Learning for Software Defect Prediction
Neural Processing Letters ( IF 3.1 ) Pub Date : 2020-09-25 , DOI: 10.1007/s11063-020-10355-z
Liang Niu , Jianwu Wan , Hongyuan Wang , Kaiwei Zhou

In recent years, software defect prediction has been recognized as a cost-sensitive learning problem. To deal with the unequal misclassification losses resulted by different classification errors, some cost-sensitive dictionary learning methods have been proposed recently. Generally speaking, these methods usually define the misclassification costs to measure the unequal losses and then propose to minimize the cost-sensitive reconstruction loss by embedding the cost information into the reconstruction function of dictionary learning. Although promising performance has been achieved, their cost-sensitive reconstruction functions are not well-designed. In addition, no sufficient attentions are paid to the coding coefficients which can also be helpful to reduce the reconstruction loss. To address these issues, this paper proposes a new cost-sensitive reconstruction loss function and introduces an additional cost-sensitive discrimination regularization for the coding coefficients. Both the two terms are jointly optimized in a unified cost-sensitive dictionary learning framework. By doing so, we can achieve the minimum reconstruction loss and thus obtain a more cost-sensitive dictionary for feature encoding of test data. In the experimental part, we have conducted extensive experiments on twenty-five software projects from four benchmark datasets of NASA, AEEEM, ReLink and Jureczko. The results, in comparison with ten state-of-the-art software defect prediction methods, demonstrate the effectiveness of learned cost-sensitive dictionary for software defect prediction.



中文翻译:

成本敏感的字典学习,用于软件缺陷预测

近年来,软件缺陷预测已被视为对成本敏感的学习问题。为了解决由于不同分类错误导致的不平等分类错误,最近提出了一些成本敏感的字典学习方法。一般而言,这些方法通常会定义错误分类成本以衡量不平等的损失,然后提出通过将成本信息嵌入字典学习的重建功能中来最小化成本敏感的重建损失。尽管已经实现了令人鼓舞的性能,但是它们对成本敏感的重建功能并未经过精心设计。另外,没有充分注意编码系数,这也可能有助于减少重建损失。为了解决这些问题,本文提出了一种新的对成本敏感的重建损失函数,并为编码系数引入了对成本敏感的鉴别正则化。在统一的对成本敏感的字典学习框架中,两个术语都得到了优化。通过这样做,我们可以实现最小的重建损失,从而获得对成本更敏感的字典,用于测试数据的特征编码。在实验部分,我们对 我们可以实现最小的重建损失,从而获得对成本更敏感的字典,用于测试数据的特征编码。在实验部分,我们对 我们可以实现最小的重建损失,从而获得对成本更敏感的字典,用于测试数据的特征编码。在实验部分,我们对来自NASA,AEEEM,ReLink和Jureczko四个基准数据集的25个软件项目。与十种最先进的软件缺陷预测方法相比,结果证明了学习的成本敏感字典对于软件缺陷预测的有效性。

更新日期:2020-09-25
down
wechat
bug