当前位置: X-MOL 学术J. Multivar. Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Variable selection for partially linear models via Bayesian subset modeling with diffusing prior
Journal of Multivariate Analysis ( IF 1.6 ) Pub Date : 2021-02-13 , DOI: 10.1016/j.jmva.2021.104733
Jia Wang 1 , Xizhen Cai 2 , Runze Li 1
Affiliation  

Most existing methods of variable selection in partially linear models (PLM) with ultrahigh dimensional covariates are based on partial residuals, which involve a two-step estimation procedure. While the estimation error produced in the first step may have an impact on the second step, multicollinearity among predictors adds additional challenges in the model selection procedure. In this paper, we propose a new Bayesian variable selection approach for PLM. This new proposal addresses those two issues simultaneously as (1) it is a one-step method which selects variables in PLM, even when the dimension of covariates increases at an exponential rate with the sample size, and (2) the method retains model selection consistency, and outperforms existing ones in the setting of highly correlated predictors. Distinguished from existing ones, our proposed procedure employs the difference-based method to reduce the impact from the estimation of the nonparametric component, and incorporates Bayesian subset modeling with diffusing prior (BSM-DP) to shrink the corresponding estimator in the linear component. The estimation is implemented by Gibbs sampling, and we prove that the posterior probability of the true model being selected converges to one asymptotically. Simulation studies support the theory and the efficiency of our methods as compared to other existing ones, followed by an application in a study of supermarket data.



中文翻译:

通过具有扩散先验的贝叶斯子集建模对部分线性模型进行变量选择

大多数现有的具有超高维协变量的部分线性模型 (PLM) 中的变量选择方法都是基于部分残差,这​​涉及两步估计过程。虽然第一步产生的估计误差可能会对第二步产生影响,但预测变量之间的多重共线性在模型选择过程中增加了额外的挑战。在本文中,我们提出了一种新的 PLM 贝叶斯变量选择方法。这一新提案同时解决了这两个问题,因为 (1) 它是一种在 PLM 中选择变量的一步法,即使协变量的维数随着样本量以指数速率增加,并且 (2) 该方法保留了模型选择一致性,并且在高度相关的预测变量的设置中优于现有的。区别于现有的,我们提出的程序采用基于差异的方法来减少非参数分量估计的影响,并结合贝叶斯子集建模和扩散先验(BSM-DP)来缩小线性分量中的相应估计量。估计是通过吉布斯抽样来实现的,我们证明了被选择的真实模型的后验概率渐近收敛到一个。与其他现有方法相比,模拟研究支持了我们方法的理论和效率,然后将其应用于超市数据研究。估计是通过吉布斯抽样来实现的,我们证明了被选择的真实模型的后验概率渐近收敛到一个。与其他现有方法相比,模拟研究支持了我们方法的理论和效率,然后将其应用于超市数据研究。估计是通过吉布斯抽样来实现的,我们证明了被选择的真实模型的后验概率渐近收敛到一个。与其他现有方法相比,模拟研究支持了我们方法的理论和效率,然后将其应用于超市数据研究。

更新日期:2021-02-24
down
wechat
bug