Evaluation of Six Methods for Correcting Bias in Estimates from Ensemble Tree Machine Learning Regression Models,Environmental Modelling & Software

当前位置： X-MOL 学术 › Environ. Model. Softw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Evaluation of Six Methods for Correcting Bias in Estimates from Ensemble Tree Machine Learning Regression Models
Environmental Modelling & Software ( IF 4.9 ) Pub Date : 2021-02-24 , DOI: 10.1016/j.envsoft.2021.105006
K. Belitz , P.E. Stackelberg

Ensemble-tree machine learning (ML) regression models can be prone to systematic bias: small values are overestimated and large values are underestimated. Additional bias can be introduced if the dependent variable is a transform of the original data. Six methods were evaluated for their ability to correct systematic and introduced bias. Method performance was evaluated using four case studies of groundwater quality: the units of the dependent variable were pH in two and log-concentration in the others. When performance metrics (bias and RMSE for both points and the CDF) were computed using the same units as those in the ML model, empirical distribution matching (EDM) provided the best results. When the metrics were computed using retransformed concentration, EDM and a method incorporating Duan’s smearing estimate were both effective. A method based on the Z-score transform approximates EDM if the correlation coefficient between rank-ordered ML estimates and rank-ordered observations approaches one.

中文翻译：

集成树机器学习回归模型中估计偏差的六种校正方法的评估

集成树机器学习（ML）回归模型可能易于出现系统偏差：小值被高估，大值被低估。如果因变量是原始数据的变换，则可以引入附加偏差。评价了六种方法纠正系统偏差和引入偏差的能力。使用四个地下水质量案例研究评估了方法的性能：因变量的单位是pH值是两个，其他是对数浓度。当使用与ML模型中相同的单位来计算性能指标（两个点和CDF的偏差和RMSE）时，经验分布匹配（EDM）可提供最佳结果。当使用重新转换的浓度来计算度量标准时，EDM和结合Duan涂片估计的方法都有效。

更新日期：2021-02-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>