当前位置: X-MOL 学术Comput. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Random forest with acceptance–rejection trees
Computational Statistics ( IF 1.3 ) Pub Date : 2019-10-29 , DOI: 10.1007/s00180-019-00929-4
Peter Calhoun , Melodie J. Hallett , Xiaogang Su , Guy Cafri , Richard A. Levine , Juanjuan Fan

In this paper, we propose a new random forest method based on completely randomized splitting rules with an acceptance–rejection criterion for quality control. We show how the proposed acceptance–rejection (AR) algorithm can outperform the standard random forest algorithm (RF) and some of its variants including extremely randomized (ER) trees and smooth sigmoid surrogate (SSS) trees. Twenty datasets were analyzed to compare prediction performance and a simulated dataset was used to assess variable selection bias. In terms of prediction accuracy for classification problems, the proposed AR algorithm performed the best, with ER being the second best. For regression problems, RF and SSS performed the best, followed by AR, and then ER at the last. However, each algorithm was most accurate for at least one study. We investigate scenarios where the AR algorithm can yield better predictive performance. In terms of variable importance, both RF and SSS demonstrated selection bias in favor of variables with many possible splits, while both ER and AR largely removed this bias.

中文翻译:

带有接受拒绝树的随机森林

在本文中,我们提出了一种基于完全随机分裂规则和质量控制接受-拒绝标准的随机森林方法。我们将展示所提出的接受拒绝(AR)算法如何能胜过标准随机森林算法(RF)及其一些变体,包括极随机化(ER)树和平滑S形替代(SSS)树。分析了二十个数据集以比较预测性能,并使用模拟数据集评估变量选择偏差。就分类问题的预测准确性而言,所提出的AR算法表现最佳,而ER则排名第二。对于回归问题,RF和SSS表现最佳,其次是AR,最后是ER。但是,对于至少一项研究,每种算法最准确。我们研究了AR算法可以产生更好的预测性能的场景。就变量的重要性而言,RF和SSS都显示出选择偏差,而倾向于具有许多可能拆分的变量,而ER和AR都大大消除了这种偏差。
更新日期:2019-10-29
down
wechat
bug