当前位置: X-MOL 学术Int. J. Softw. Eng. Knowl. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Software Defect Prediction Based on Cost-Sensitive Dictionary Learning
International Journal of Software Engineering and Knowledge Engineering ( IF 0.9 ) Pub Date : 2019-10-10 , DOI: 10.1142/s0218194019500384
Hongyan Wan 1 , Guoqing Wu 1 , Mali Yu 2 , Mengting Yuan 1
Affiliation  

Software defect prediction technology has been widely used in improving the quality of software system. Most real software defect datasets tend to have fewer defective modules than defective-free modules. Highly class-imbalanced data typically make accurate predictions difficult. The imbalanced nature of software defect datasets makes the prediction model classifying a defective module as a defective-free one easily. As there exists the similarity during the different software modules, one module can be represented by the sparse representation coefficients over the pre-defined dictionary which consists of historical software defect datasets. In this study, we make use of dictionary learning method to predict software defect. We optimize the classifier parameters and the dictionary atoms iteratively, to ensure that the extracted features (sparse representation) are optimal for the trained classifier. We prove the optimal condition of the elastic net which is used to solve the sparse coding coefficients and the regularity of the elastic net solution. Due to the reason that the misclassification of defective modules generally incurs much higher cost risk than the misclassification of defective-free ones, we take the different misclassification costs into account, increasing the punishment on misclassification defective modules in the procedure of dictionary learning, making the classification inclining to classify a module as a defective one. Thus, we propose a cost-sensitive software defect prediction method using dictionary learning (CSDL). Experimental results on the 10 class-imbalance datasets of NASA show that our method is more effective than several typical state-of-the-art defect prediction methods.

中文翻译:

基于代价敏感字典学习的软件缺陷预测

软件缺陷预测技术已广泛应用于提高软件系统质量。大多数真实软件缺陷数据集的缺陷模块往往比无缺陷模块少。高度不平衡的数据通常会使准确的预测变得困难。软件缺陷数据集的不平衡性使得预测模型很容易将有缺陷的模块分类为无缺陷模块。由于不同软件模块之间存在相似性,一个模块可以用由历史软件缺陷数据集组成的预定义字典上的稀疏表示系数来表示。在这项研究中,我们利用字典学习方法来预测软件缺陷。我们迭代优化分类器参数和字典原子,以确保提取的特征(稀疏表示)对于经过训练的分类器是最优的。我们证明了用于求解稀疏编码系数的弹性网的最优条件和弹性网解的规律性。由于有缺陷模块的错误分类通常比无缺陷模块的错误分类产生更高的成本风险,我们考虑了不同的错误分类成本,在字典学习过程中增加了对错误分类缺陷模块的惩罚,使分类倾向于将模块分类为有缺陷的模块。因此,我们提出了一种使用字典学习(CSDL)的成本敏感的软件缺陷预测方法。
更新日期:2019-10-10
down
wechat
bug