当前位置: X-MOL 学术J. R. Stat. Soc. B › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A simple new approach to variable selection in regression, with application to genetic fine mapping
The Journal of the Royal Statistical Society, Series B (Statistical Methodology) ( IF 3.1 ) Pub Date : 2020-07-10 , DOI: 10.1111/rssb.12388
Gao Wang 1 , Abhishek Sarkar 1 , Peter Carbonetto 1, 2 , Matthew Stephens 1, 3
Affiliation  

We introduce a simple new approach to variable selection in linear regression, with a particular focus on quantifying uncertainty in which variables should be selected. The approach is based on a new model—the ‘sum of single effects’ model, called ‘SuSiE’—which comes from writing the sparse vector of regression coefficients as a sum of ‘single‐effect’ vectors, each with one non‐zero element. We also introduce a corresponding new fitting procedure—iterative Bayesian stepwise selection (IBSS)—which is a Bayesian analogue of stepwise selection methods. IBSS shares the computational simplicity and speed of traditional stepwise methods but, instead of selecting a single variable at each step, IBSS computes a distribution on variables that captures uncertainty in which variable to select. We provide a formal justification of this intuitive algorithm by showing that it optimizes a variational approximation to the posterior distribution under SuSiE. Further, this approximate posterior distribution naturally yields convenient novel summaries of uncertainty in variable selection, providing a credible set of variables for each selection. Our methods are particularly well suited to settings where variables are highly correlated and detectable effects are sparse, both of which are characteristics of genetic fine mapping applications. We demonstrate through numerical experiments that our methods outperform existing methods for this task, and we illustrate their application to fine mapping genetic variants influencing alternative splicing in human cell lines. We also discuss the potential and challenges for applying these methods to generic variable‐selection problems.

中文翻译:

一种简单的回归变量选择新方法,应用于遗传精细作图

我们在线性回归中引入了一种简单的变量选择新方法,特别侧重于量化应选择变量的不确定性。该方法基于一种新模型——“单一效应之和”模型,称为“ SuSiE ”——它来自于将回归系数的稀疏向量写为“单一效应”向量之和,每个向量都有一个非零值元素。我们还介绍了相应的新拟合程序​​——迭代贝叶斯逐步选择 (IBSS)——这是逐步选择方法的贝叶斯模拟。IBSS 具有传统逐步方法的计算简单性和速度,但是,IBSS 不是在每一步选择单个变量,而是计算一个分布在捕获不确定性的变量上选择哪个变量。我们通过证明它优化了 SuSiE 下后验分布的变分近似,为这种直观算法提供了正式的证明。此外,这种近似后验分布自然地产生了变量选择中不确定性的方便新颖的总结,为每个选择提供了一组可靠的变量。我们的方法特别适用于变量高度相关且可检测效果稀疏的设置,这两者都是遗传精细定位应用的特征。我们通过数值实验证明我们的方法优于现有方法来完成这项任务,并且我们说明了它们在精细定位影响人类细胞系可变剪接的遗传变异中的应用。
更新日期:2020-07-10
down
wechat
bug