当前位置: X-MOL 学术Stat. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bayesian Additive Regression Trees using Bayesian Model Averaging.
Statistics and Computing ( IF 1.6 ) Pub Date : 2017-07-27 , DOI: 10.1007/s11222-017-9767-1
Belinda Hernández 1 , Adrian E Raftery 2 , Stephen R Pennington 3 , Andrew C Parnell 1, 4
Affiliation  

Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However, for datasets where the number of variables p is large the algorithm can become inefficient and computationally expensive. Another method which is popular for high-dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However, its default implementation does not produce probabilistic estimates or predictions. We propose an alternative fitting algorithm for BART called BART-BMA, which uses Bayesian model averaging and a greedy search algorithm to obtain a posterior distribution more efficiently than BART for datasets with large p. BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data. We have found that BART-BMA can be run in a reasonable time on a standard laptop for the “small n large p” scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments, one to distinguish between patients with cardiovascular disease and controls and another to classify aggressive from non-aggressive prostate cancer. We compare our results to their main competitors. Open source code written in R and Rcpp to run BART-BMA can be found at: https://github.com/BelindaHernandez/BART-BMA.git.

中文翻译:

使用贝叶斯模型平均的贝叶斯加性回归树。

贝叶斯可加回归树(BART)是树模型的统计和。可以认为是机器学习树集成方法的贝叶斯版本,其中单个树是基础学习者。但是,对于变量数为p的数据集如果算法过大,该算法可能会变得效率低下且计算量大。另一种流行于高维数据的方法是随机森林,这是一种机器学习算法,该算法使用贪婪搜索来寻找最佳分割点来生长树木。但是,其默认实现不会产生概率估计或预测。对于BART,我们提出了另一种适合的算法,称为BART-BMA,它使用贝叶斯模型平均和贪婪搜索算法来为p大的数据集比BART更有效地获得后验分布。BART-BMA结合了BART和随机森林的元素,以提供一种基于模型的算法,可以处理高维数据。我们已经发现,BART-BMA可以在合理的时间在一个标准的笔记本电脑,为“小步快跑ñp ”场景,这在生物信息学的许多领域都很普遍。我们使用模拟数据和来自两个真实蛋白质组学实验的数据展示了这种方法,一个用于区分患有心血管疾病的患者和对照组,另一个用于区分非侵略性前列腺癌和侵袭性前列腺癌。我们将我们的结果与其主要竞争对手进行比较。可以在以下网址找到用R和Rcpp编写的用于运行BART-BMA的开源代码:https://github.com/BelindaHernandez/BART-BMA.git。
更新日期:2017-07-27
down
wechat
bug