Model Based Screening Embedded Bayesian Variable Selection for Ultra-high Dimensional Settings,Journal of Computational and Graphical Statistics

当前位置： X-MOL 学术 › J. Comput. Graph. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Model Based Screening Embedded Bayesian Variable Selection for Ultra-high Dimensional Settings
Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2022-06-02 , DOI: 10.1080/10618600.2022.2074428
Dongjin Li ₁ , Somak Dutta ₁ , Vivekananda Roy ₁

Affiliation

Abstract

We develop a Bayesian variable selection method, called SVEN, based on a hierarchical Gaussian linear model with priors placed on the regression coefficients as well as on the model space. Sparsity is achieved by using degenerate spike priors on inactive variables, whereas Gaussian slab priors are placed on the coefficients for the important predictors making the posterior probability of a model available in explicit form (up to a normalizing constant). Embedding a unique model based screening and using fast Cholesky updates, SVEN produces a highly scalable computational framework to explore gigantic model spaces, rapidly identify the regions of high posterior probabilities and make fast inference and prediction. A temperature schedule is used to further mitigate multimodal posterior distributions. The temperature value is guided by our model selection consistency results which hold even when the norm of mean effects solely due to the unimportant variables diverges. An appealing byproduct of SVEN is the construction of novel model weight adjusted prediction intervals. The performance of SVEN is demonstrated through a number of simulation experiments and a real data example from a genome wide association study with over half a million markers. Supplementary materials for this article are available online.

中文翻译：

超高维设置的基于模型的筛选嵌入式贝叶斯变量选择

摘要

我们开发了一种称为 SVEN 的贝叶斯变量选择方法，该方法基于分层高斯线性模型，先验分布在回归系数和模型空间上。稀疏性是通过对非活动变量使用退化尖峰先验来实现的，而高斯平板先验放在重要预测变量的系数上，使模型的后验概率以显式形式可用（达到归一化常数）。嵌入基于筛选的独特模型并使用快速 Cholesky 更新，SVEN 生成高度可扩展的计算框架来探索巨大的模型空间，快速识别高后验概率区域并进行快速推理和预测。温度计划用于进一步减轻多峰后验分布。温度值由我们的模型选择一致性结果指导，即使仅由于不重要的变量导致平均效应的范数不同，该一致性结果也成立。SVEN 的一个吸引人的副产品是构建新的模型权重调整预测区间。SVEN 的性能通过大量模拟实验和来自具有超过 50 万个标记的全基因组关联研究的真实数据示例得到证明。本文的补充材料可在线获取。

更新日期：2022-06-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11