当前位置: X-MOL 学术J. R. Stat. Soc. B › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Variable selection with ABC Bayesian forests
The Journal of the Royal Statistical Society, Series B (Statistical Methodology) ( IF 3.1 ) Pub Date : 2021-04-05 , DOI: 10.1111/rssb.12423
Yi Liu 1 , Veronika Ročková 2 , Yuexi Wang 2
Affiliation  

Few problems in statistics are as perplexing as variable selection in the presence of very many redundant covariates. The variable selection problem is most familiar in parametric environments such as the linear model or additive variants thereof. In this work, we abandon the linear model framework, which can be quite detrimental when the covariates impact the outcome in a non-linear way, and turn to tree-based methods for variable selection. Such variable screening is traditionally done by pruning down large trees or by ranking variables based on some importance measure. Despite heavily used in practice, these ad hoc selection rules are not yet well understood from a theoretical point of view. In this work, we devise a Bayesian tree-based probabilistic method and show that it is consistent for variable selection when the regression surface is a smooth mix of p > n covariates. These results are the first model selection consistency results for Bayesian forest priors. Probabilistic assessment of variable importance is made feasible by a spike-and-slab wrapper around sum-of-trees priors. Sampling from posterior distributions over trees is inherently very difficult. As an alternative to Markov Chain Monte Carlo (MCMC), we propose approximate Bayesian computation (ABC) Bayesian forests, a new ABC sampling method based on data-splitting that achieves higher ABC acceptance rate. We show that the method is robust and successful at finding variables with high marginal inclusion probabilities. Our ABC algorithm provides a new avenue towards approximating the median probability model in non-parametric setups where the marginal likelihood is intractable.

中文翻译:

ABC 贝叶斯森林的变量选择

在存在大量冗余协变量的情况下,统计学中很少有问题像变量选择一样令人困惑。变量选择问题在参数化环境中最为常见,例如线性模型或其加法变体。在这项工作中,我们放弃了当协变量以非线性方式影响结果时可能非常有害的线性模型框架,转而使用基于树的方法进行变量选择。这种变量筛选传统上是通过修剪大树或根据某些重要性度量对变量进行排序来完成的。尽管在实践中大量使用,但从理论的角度来看,这些临时选择规则还没有得到很好的理解。在这项工作中,p  >  n协变量。这些结果是贝叶斯森林先验的第一个模型选择一致性结果。通过围绕树之和先验的尖刺和平板包装器使变量重要性的概率评估变得可行。从树上的后验分布采样本质上是非常困难的。作为马尔可夫链蒙特卡罗 (MCMC) 的替代方案,我们提出了近似贝叶斯计算 (ABC) 贝叶斯森林,这是一种基于数据拆分的新 ABC 采样方法,可实现更高的 ABC 接受率。我们表明该方法在寻找具有高边际包含概率的变量方面是稳健且成功的。我们的 ABC 算法为在边缘似然难以处理的非参数设置中逼近中值概率模型提供了新途径。
更新日期:2021-04-05
down
wechat
bug