Discriminating features-based cost-sensitive approach for software defect prediction,Automated Software Engineering

当前位置： X-MOL 学术 › Automat. Softw. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Discriminating features-based cost-sensitive approach for software defect prediction
Automated Software Engineering ( IF 2.0 ) Pub Date : 2021-07-12 , DOI: 10.1007/s10515-021-00289-8
Aftab Ali ₁ , Naveed Khan ₁ , Mamun Abu-Tair ₁ , Sally McClean ₁ , Ian McChesney ₁ , Joost Noppen ₂

Affiliation

Correlated quality metrics extracted from a source code repository can be utilized to design a model to automatically predict defects in a software system. It is obvious that the extracted metrics will result in a highly unbalanced data, since the number of defects in a good quality software system should be far less than the number of normal instances. It is also a fact that the selection of the best discriminating features significantly improves the robustness and accuracy of a prediction model. Therefore, the contribution of this paper is twofold, first it selects the best discriminating features that help in accurately predicting a defect in a software component. Secondly, a cost-sensitive logistic regression and decision tree ensemble-based prediction models are applied to the best discriminating features for precisely predicting a defect in a software component. The proposed models are compared with the most recent schemes in the literature in terms of accuracy, area under the curve, and recall. The models are evaluated using 11 datasets and it is evident from the results and analysis that the performance of the proposed prediction models outperforms the schemes in the literature.

中文翻译：

用于软件缺陷预测的基于识别特征的成本敏感方法

从源代码存储库中提取的相关质量指标可用于设计模型以自动预测软件系统中的缺陷。很明显，提取的度量会导致数据高度不平衡，因为质量好的软件系统中的缺陷数量应该远远少于正常实例的数量。选择最佳判别特征显着提高预测模型的鲁棒性和准确性也是一个事实。因此，本文的贡献是双重的，首先它选择了有助于准确预测软件组件中缺陷的最佳判别特征。第二，成本敏感的逻辑回归和基于决策树集成的预测模型应用于最佳区分特征，以精确预测软件组件中的缺陷。所提出的模型在准确性、曲线下面积和召回率方面与文献中的最新方案进行了比较。这些模型使用 11 个数据集进行评估，从结果和分析中可以明显看出，所提出的预测模型的性能优于文献中的方案。

更新日期：2021-07-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11