Tree aggregation for random forest class probability estimation,Statistical Analysis and Data Mining

当前位置： X-MOL 学术 › Stat. Anal. Data Min. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Tree aggregation for random forest class probability estimation
Statistical Analysis and Data Mining ( IF 2.1 ) Pub Date : 2020-01-02 , DOI: 10.1002/sam.11446
Andrew J. Sage ₁ , Ulrike Genschel ₂ , Dan Nettleton ₂

Affiliation

In random forest methodology, an overall prediction or estimate is made by aggregating predictions made by individual decision trees. Popular implementations of random forests rely on different methods for aggregating predictions. In this study, we provide an empirical analysis of the performance of aggregation approaches available for classification and regression problems. We show that while the choice of aggregation scheme usually has little impact in regression, it can have a profound effect on probability estimation in classification problems. Our study illustrates the causes of calibration issues that arise from two popular aggregation approaches and highlights the important role that terminal nodesize plays in the aggregation of tree predictions. We show that optimal choices for random forest tuning parameters depend heavily on the manner in which tree predictions are aggregated.

中文翻译：

随机森林类别概率估计的树聚合

在随机森林方法中，通过汇总各个决策树的预测来进行总体预测或估计。随机森林的流行实现依靠不同的方法来汇总预测。在这项研究中，我们对可用于分类和回归问题的聚合方法的性能进行了实证分析。我们表明，尽管聚合方案的选择通常对回归没有多大影响，但它对分类问题中的概率估计会产生深远的影响。我们的研究说明了由两种流行的聚合方法引起的校准问题的原因，并强调了终端节点大小在树预测的聚合中发挥的重要作用。

更新日期：2020-01-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11