当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Double random forest
Machine Learning ( IF 4.3 ) Pub Date : 2020-07-02 , DOI: 10.1007/s10994-020-05889-1
Sunwoo Han , Hyunjoong Kim , Yung-Seop Lee

Random forest (RF) is one of the most popular parallel ensemble methods, using decision trees as classifiers. One of the hyper-parameters to choose from for RF fitting is the nodesize, which determines the individual tree size. In this paper, we begin with the observation that for many data sets (34 out of 58), the best RF prediction accuracy is achieved when the trees are grown fully by minimizing the nodesize parameter. This observation leads to the idea that prediction accuracy could be further improved if we find a way to generate even bigger trees than the ones with a minimum nodesize. In other words, the largest tree created with the minimum nodesize parameter may not be sufficiently large for the best performance of RF. To produce bigger trees than those by RF, we propose a new classification ensemble method called double random forest (DRF). The new method uses bootstrap on each node during the tree creation process, instead of just bootstrapping once on the root node as in RF. This method, in turn, provides an ensemble of more diverse trees, allowing for more accurate predictions. Finally, for data where RF does not produce trees of sufficient size, we have successfully demonstrated that DRF provides more accurate predictions than RF.

中文翻译:

双随机森林

随机森林(RF)是最流行的并行集成方法之一,使用决策树作为分类器。可供选择用于 RF 拟合的超参数之一是节点大小,它决定了单个树的大小。在本文中,我们首先观察到,对于许多数据集(58 个中的 34 个),当通过最小化节点大小参数使树完全生长时,可以获得最佳的 RF 预测精度。这一观察结果表明,如果我们找到一种方法来生成比具有最小节点大小的树更大的树,则可以进一步提高预测精度。换句话说,使用最小节点大小参数创建的最大树可能不足以实现 RF 的最佳性能。为了生产比 RF 更大的树,我们提出了一种新的分类集成方法,称为双随机森林(DRF)。新方法在树创建过程中在每个节点上使用引导程序,而不是像 RF 那样只在根节点上引导程序一次。反过来,这种方法提供了更多样化的树的集合,从而可以进行更准确的预测。最后,对于 RF 无法生成足够大的树的数据,我们已经成功证明 DRF 提供比 RF 更准确的预测。
更新日期:2020-07-02
down
wechat
bug