Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data,Journal of Computational and Graphical Statistics

当前位置： X-MOL 学术 › J. Comput. Graph. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data
Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2016-07-02 , DOI: 10.1080/10618600.2015.1089774
Leo L Duan ₁ , John P Clancy ₂ , Rhonda D Szczesniak ₃

Affiliation

We propose a novel “tree-averaging” model that uses the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian ensemble trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplementary materials for this article are available online.

中文翻译：

用于异构数据聚类和预测的贝叶斯集成树 (BET)

我们提出了一种新颖的“树平均”模型，该模型使用分类和回归树（CART）的集成。每个组成树都使用相似数据的子集进行估计。我们将这组子集视为贝叶斯集成树 (BET)，并将它们建模为 Dirichlet 过程。我们表明 BET 通过适应数据异质性来确定最佳树数。与其他集成方法相比，BET 需要更少的树，并且使用加权平均显示出等效的预测精度。此外，BET 中的每棵树都为每个子集提供了变量选择标准和解释。我们开发了一种有效的估计程序，并在 CART 和混合模型中改进了估计策略。我们通过模拟证明了 BET 的这些优势，并用真实世界的数据示例说明了该方法，该示例涉及从囊性纤维化患者获得的肺功能测量值的回归。本文的补充材料可在线获取。

更新日期：2016-07-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11