A residual-based approach for robust random forest regression,Statistics and Its Interface

当前位置： X-MOL 学术 › Stat. Interface › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A residual-based approach for robust random forest regression
Statistics and Its Interface ( IF 0.3 ) Pub Date : 2021-07-08 , DOI: 10.4310/20-sii660
Andrew J. Sage ₁ , Ulrike Genschel ₂ , Dan Nettleton ₂

Affiliation

We introduce a novel robust approach for random forest regression that is useful when the conditional distribution of the response variable, given predictor values, is contaminated. Residual analysis is used to identify unusual response values in training data, and the contributions of these values are down-weighted accordingly. This approach is motivated by a robust fitting procedure first proposed in the context of locally weighted polynomial regression and scatterplot smoothing. We demonstrate that tuning the parameter in the robustness algorithm using a weighted crossvalidation approach is advantageous when contamination is suspected in training data responses. We conduct extensive simulations, comparing our method to existing robust approaches, some of which have not been compared to one another in prior studies. Our approach outperforms existing techniques on noisy training datasets with response contamination. While no approach is uniformly optimal, ours is consistently competitive with the best existing approaches for robust random forest regression.

中文翻译：

基于残差的鲁棒随机森林回归方法

我们为随机森林回归引入了一种新颖的鲁棒方法，当响应变量的条件分布（给定预测值）受到污染时，该方法很有用。残差分析用于识别训练数据中的异常响应值，并相应地降低这些值的贡献。这种方法的动机是在局部加权多项式回归和散点图平滑的背景下首次提出的稳健拟合程序。我们证明，当训练数据响应中怀疑存在污染时，使用加权交叉验证方法调整稳健性算法中的参数是有利的。我们进行了广泛的模拟，将我们的方法与现有的稳健方法进行比较，其中一些在先前的研究中尚未相互比较。我们的方法在具有响应污染的嘈杂训练数据集上优于现有技术。虽然没有一种方法是一致最优的，但我们的方法始终与现有的鲁棒随机森林回归的最佳方法竞争。

更新日期：2021-07-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11