当前位置:
X-MOL 学术
›
Glob. Ecol. Biogeogr.
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Calibration of probability predictions from machine‐learning and statistical models
Global Ecology and Biogeography ( IF 6.4 ) Pub Date : 2020-02-05 , DOI: 10.1111/geb.13070 Carsten F. Dormann 1
Global Ecology and Biogeography ( IF 6.4 ) Pub Date : 2020-02-05 , DOI: 10.1111/geb.13070 Carsten F. Dormann 1
Affiliation
AIM: Predictions from statistical models may be uncalibrated, meaning that the predicted values do not have the nominal coverage probability. This is easiest seen with probability predictions in machine‐learning classification, including the common species occurrence probabilities. Here, a predicted probability of, say, .7 should indicate that out of 100 cases with these environmental conditions, and hence the same predicted probability, the species should be present in 70 and absent in 30. INNOVATION: A simple calibration plot shows that this is not necessarily the case, particularly not for overfitted models or algorithms that use non‐likelihood target functions. As a consequence, ‘raw’ predictions from such a model could easily be off by .2, are unsuitable for averaging across model types, and resulting maps hence be substantially distorted. The solution, a flexible calibration regression, is simple and can be applied whenever deviations are observed. MAIN CONCLUSIONS: ‘Raw’, uncalibrated probability predictions should be calibrated before interpreting or averaging them in a probabilistic way.
中文翻译:
从机器学习和统计模型校准概率预测
目标:来自统计模型的预测可能未经校准,这意味着预测值没有标称覆盖概率。这在机器学习分类中的概率预测中最容易看出,包括常见物种出现概率。在这里,比方说 0.7 的预测概率应该表明在具有这些环境条件的 100 个案例中,因此相同的预测概率,该物种应该在 70 个中存在,在 30 个中不存在。 创新:一个简单的校准图显示情况并非一定如此,特别是对于使用非似然目标函数的过拟合模型或算法。因此,来自此类模型的“原始”预测很容易偏离 0.2,不适合跨模型类型求平均值,因此生成的地图会严重失真。解决方案是灵活的校准回归,它很简单,可以在观察到偏差时应用。主要结论:“原始”、未校准的概率预测应在以概率方式解释或平均之前进行校准。
更新日期:2020-02-05
中文翻译:
从机器学习和统计模型校准概率预测
目标:来自统计模型的预测可能未经校准,这意味着预测值没有标称覆盖概率。这在机器学习分类中的概率预测中最容易看出,包括常见物种出现概率。在这里,比方说 0.7 的预测概率应该表明在具有这些环境条件的 100 个案例中,因此相同的预测概率,该物种应该在 70 个中存在,在 30 个中不存在。 创新:一个简单的校准图显示情况并非一定如此,特别是对于使用非似然目标函数的过拟合模型或算法。因此,来自此类模型的“原始”预测很容易偏离 0.2,不适合跨模型类型求平均值,因此生成的地图会严重失真。解决方案是灵活的校准回归,它很简单,可以在观察到偏差时应用。主要结论:“原始”、未校准的概率预测应在以概率方式解释或平均之前进行校准。