当前位置: X-MOL 学术J. Am. Stat. Assoc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Spike-and-Slab Group Lassos for Grouped Regression and Sparse Generalized Additive Models
Journal of the American Statistical Association ( IF 3.7 ) Pub Date : 2020-06-08 , DOI: 10.1080/01621459.2020.1765784
Ray Bai 1 , Gemma E. Moran 2 , Joseph L. Antonelli 3 , Yong Chen 4 , Mary R. Boland 4
Affiliation  

Abstract–We introduce the spike-and-slab group lasso (SSGL) for Bayesian estimation and variable selection in linear regression with grouped variables. We further extend the SSGL to sparse generalized additive models (GAMs), thereby introducing the first nonparametric variant of the spike-and-slab lasso methodology. Our model simultaneously performs group selection and estimation, while our fully Bayes treatment of the mixture proportion allows for model complexity control and automatic self-adaptivity to different levels of sparsity. We develop theory to uniquely characterize the global posterior mode under the SSGL and introduce a highly efficient block coordinate ascent algorithm for maximum a posteriori estimation. We further employ de-biasing methods to provide uncertainty quantification of our estimates. Thus, implementation of our model avoids the computational intensiveness of Markov chain Monte Carlo in high dimensions. We derive posterior concentration rates for both grouped linear regression and sparse GAMs when the number of covariates grows at nearly exponential rate with sample size. Finally, we illustrate our methodology through extensive simulations and data analysis. Supplementary materials for this article are available online.



中文翻译:

用于分组回归和稀疏广义加法模型的 Spike-and-Slab Group Lassos

摘要-我们介绍了用于贝叶斯估计和分组变量线性回归中的变量选择的尖峰和平板组套索 (SSGL)。我们进一步将 SSGL 扩展到稀疏广义加法模型 (GAM),从而引入了尖峰和平板套索方法的第一个非参数变体。我们的模型同时执行组选择和估计,而我们对混合比例的完全贝叶斯处理允许模型复杂性控制和对不同稀疏程度的自动自适应。我们开发了独特的理论来描述 SSGL 下的全局后验模式,并引入了一种高效的块坐标上升算法来进行最大后验估计。我们进一步采用去偏方法来提供我们估计的不确定性量化。因此,我们模型的实现避免了马尔可夫链蒙特卡罗在高维上的计算密集度。当协变量的数量随样本量以近乎指数的速度增长时,我们得出了分组线性回归和稀疏 GAM 的后验集中率。最后,我们通过广泛的模拟和数据分析来说明我们的方法。本文的补充材料可在线获取。

更新日期:2020-06-08
down
wechat
bug