当前位置: X-MOL 学术Stat › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
High‐dimensional variable screening under multicollinearity
Stat ( IF 0.7 ) Pub Date : 2020-04-22 , DOI: 10.1002/sta4.272
Naifei Zhao 1 , Qingsong Xu 2 , Man‐Lai Tang 3 , Binyan Jiang 4 , Ziqi Chen 2 , Hong Wang 2
Affiliation  

Variable screening is of fundamental importance in linear regression models when the number of predictors far exceeds the number of observations. Multicollinearity is a common phenomenon in high‐dimensional settings, in which two or more predictor variables are highly correlated, leading to the notorious difficulty for high‐dimensional variable screening. Sure independence screening (SIS) procedure can greatly reduce the dimensionality, but it may break down when the predictors are highly correlated. By combing the factor modelling with SIS, the profiled independence screening (PIS) approach was proposed. However, under a spiked population model, the profiled predictors could not be guaranteed to be uncorrelated and PIS may therefore be misleading. Instead of assuming either the predictors are uncorrelated as in SIS or the profiled predictors are uncorrelated as in PIS, a more general and challenging scenario is considered in which the predictors can be highly correlated. A so‐called preconditioned PIS (PPIS) method is proposed that produces asymptotically uncorrelated profiled predictors and thus leads to consistent model selection results under a spiked population model. Compared with PIS, the proposed method could handle the complex multicollinearity case, such as a spiked population model with a slow spectrum decay of population covariance matrix, while keeping the calculation simple. The promising performance of the proposed PPIS method will be illustrated via extensive simulation studies and two real examples.

中文翻译:

共共线性下的高维变量筛选

当预测变量的数量远远超过观察数量时,变量筛选在线性回归模型中至关重要。多重共线性是高维环境中的常见现象,其中两个或多个预测变量高度相关,从而导致高维变量筛选的困难。确保独立性筛选(SIS)程序可以大大降低维数,但是当预测变量高度相关时,它可能会崩溃。通过将因子建模与SIS相结合,进行了概要分析(PIS)方法被提出。但是,在人口激增模型下,不能保证所预测的预测变量不相关,因此PIS可能会产生误导。与其假设要么像SIS中的预测变量不相关,要么像PIS中的假设预测变量不相关,都考虑了一个更通用和更具挑战性的场景,其中预测变量可以高度相关。提出了一种所谓的预处理PIS(PPIS)方法,该方法可产生渐近不相关的剖析预测变量,从而在尖峰总体模型下产生一致的模型选择结果。与PIS相比,该方法可以处理复杂的多重共线性情况,例如种群协方差矩阵的频谱衰减慢的尖峰种群模型,同时保持计算简单。
更新日期:2020-04-22
down
wechat
bug