On the assessment of software defect prediction models via ROC curves,Empirical Software Engineering

当前位置： X-MOL 学术 › Empir. Software Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On the assessment of software defect prediction models via ROC curves
Empirical Software Engineering ( IF 4.1 ) Pub Date : 2020-08-19 , DOI: 10.1007/s10664-020-09861-4
Sandro Morasca , Luigi Lavazza

Software defect prediction models are classifiers often built by setting a threshold t on a defect proneness model, i.e., a scoring function. For instance, they classify a software module non-faulty if its defect proneness is below t and positive otherwise. Different values of t may lead to different defect prediction models, possibly with very different performance levels. Receiver Operating Characteristic (ROC) curves provide an overall assessment of a defect proneness model, by taking into account all possible values of t and thus all defect prediction models that can be built based on it. However, using a defect proneness model with a value of t is sensible only if the resulting defect prediction model has a performance that is at least as good as some minimal performance level that depends on practitioners’ and researchers’ goals and needs. We introduce a new approach and a new performance metric (the Ratio of Relevant Areas) for assessing a defect proneness model by taking into account only the parts of a ROC curve corresponding to values of t for which defect proneness models have higher performance than some reference value. We provide the practical motivations and theoretical underpinnings for our approach, by: 1) showing how it addresses the shortcomings of existing performance metrics like the Area Under the Curve and Gini’s coefficient; 2) deriving reference values based on random defect prediction policies, in addition to deterministic ones; 3) showing how the approach works with several performance metrics (e.g., Precision and Recall ) and their combinations; 4) studying misclassification costs and providing a general upper bound for the cost related to the use of any defect proneness model; 5) showing the relationships between misclassification costs and performance metrics. We also carried out a comprehensive empirical study on real-life data from the SEACRAFT repository, to show the differences between our metric and the existing ones and how more reliable and less misleading our metric can be.

中文翻译：

基于ROC曲线的软件缺陷预测模型评估

软件缺陷预测模型是分类器，通常通过在缺陷倾向模型上设置阈值 t 来构建，即评分函数。例如，如果软件模块的缺陷倾向低于 t，他们将其分类为无故障，否则为正。不同的 t 值可能导致不同的缺陷预测模型，可能具有非常不同的性能水平。接受者操作特征 (ROC) 曲线通过考虑 t 的所有可能值以及可以基于它构建的所有缺陷预测模型来提供缺陷倾向模型的整体评估。然而，仅当生成的缺陷预测模型的性能至少与某个取决于从业者和研究人员的目标和需求的最低性能水平一样好时，才使用具有 t 值的缺陷倾向模型才是明智的。我们引入了一种新的方法和新的性能指标（相关区域的比率）来评估缺陷倾向模型，只考虑与 t 值相对应的 ROC 曲线部分，其中缺陷倾向模型的性能高于某些参考价值。我们通过以下方式为我们的方法提供了实践动机和理论基础：1) 展示了它如何解决现有性能指标的缺点，如曲线下面积和基尼系数；2) 除了确定性策略外，还基于随机缺陷预测策略推导参考值；3) 展示该方法如何与多个性能指标（例如 Precision 和 Recall ）及其组合一起工作；4) 研究错误分类成本并提供与使用任何缺陷倾向模型相关的成本的一般上限；5) 显示误分类成本和性能指标之间的关系。我们还对来自 SEACRAFT 存储库的现实生活数据进行了全面的实证研究，以显示我们的指标与现有指标之间的差异，以及我们的指标如何更加可靠和更少误导。

更新日期：2020-08-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>