当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The effect of splitting on random forests
Machine Learning ( IF 4.3 ) Pub Date : 2014-07-02 , DOI: 10.1007/s10994-014-5451-2
Hemant Ishwaran 1
Affiliation  

The effect of a splitting rule on random forests (RF) is systematically studied for regression and classification problems. A class of weighted splitting rules, which includes as special cases CART weighted variance splitting and Gini index splitting, are studied in detail and shown to possess a unique adaptive property to signal and noise. We show for noisy variables that weighted splitting favors end-cut splits. While end-cut splits have traditionally been viewed as undesirable for single trees, we argue for deeply grown trees (a trademark of RF) end-cut splitting is useful because: (a) it maximizes the sample size making it possible for a tree to recover from a bad split, and (b) if a branch repeatedly splits on noise, the tree minimal node size will be reached which promotes termination of the bad branch. For strong variables, weighted variance splitting is shown to possess the desirable property of splitting at points of curvature of the underlying target function. This adaptivity to both noise and signal does not hold for unweighted and heavy weighted splitting rules. These latter rules are either too greedy, making them poor at recognizing noisy scenarios, or they are overly ECP aggressive, making them poor at recognizing signal. These results also shed light on pure random splitting and show that such rules are the least effective. On the other hand, because randomized rules are desirable because of their computational efficiency, we introduce a hybrid method employing random split-point selection which retains the adaptive property of weighted splitting rules while remaining computational efficient.

中文翻译:

分裂对随机森林的影响

针对回归和分类问题系统地研究了分裂规则对随机森林 (RF) 的影响。详细研究了一类加权分割规则,其中包括作为特殊情况的 CART 加权方差分割和 Gini 指数分割,并表明它们具有独特的信号和噪声自适应特性。我们展示了加权分裂有利于端切分裂的噪声变量。虽然传统上认为末端切割对单棵树来说是不受欢迎的,但我们认为深度生长的树木(RF 的商标)末端切割是有用的,因为:(a) 它最大化了样本量,使一棵树有可能从错误的分裂中恢复,并且(b)如果一个分支在噪声下反复分裂,将达到树的最小节点大小,这促进了错误分支的终止。对于强变量,加权方差分裂被证明具有在潜在目标函数的曲率点分裂的理想特性。这种对噪声和信号的适应性不适用于未加权和重加权拆分规则。后面的这些规则要么过于贪婪,使它们无法识别嘈杂的场景,要么它们过于激进,使它们无法识别信号。这些结果还阐明了纯随机分裂,并表明此类规则最不有效。另一方面,由于随机规则因其计算效率而受到欢迎,我们引入了一种采用随机分裂点选择的混合方法,该方法保留了加权分裂规则的自适应特性,同时保持计算效率。
更新日期:2014-07-02
down
wechat
bug