当前位置: X-MOL 学术Stat. Anal. Data Min. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Model selection procedure for high-dimensional data.
Statistical Analysis and Data Mining ( IF 1.3 ) Pub Date : 2010-09-08 , DOI: 10.1002/sam.10088
Yongli Zhang 1 , Xiaotong Shen
Affiliation  

For high‐dimensional regression, the number of predictors may greatly exceed the sample size but only a small fraction of them are related to the response. Therefore, variable selection is inevitable, where consistent model selection is the primary concern. However, conventional consistent model selection criteria like Bayesian information criterion (BIC) may be inadequate due to their nonadaptivity to the model space and infeasibility of exhaustive search. To address these two issues, we establish a probability lower bound of selecting the smallest true model by an information criterion, based on which we propose a model selection criterion, what we call RICc, which adapts to the model space. Furthermore, we develop a computationally feasible method combining the computational power of least angle regression (LAR) with that of RICc. Both theoretical and simulation studies show that this method identifies the smallest true model with probability converging to one if the smallest true model is selected by LAR. The proposed method is applied to real data from the power market and outperforms the backward variable selection in terms of price forecasting accuracy. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 350‐358, 2010

中文翻译:

高维数据的模型选择过程。

对于高维回归,预测变量的数量可能大大超过样本量,但其中只有一小部分与响应相关。因此,变量选择是不可避免的,其中一致的模型选择是主要关注点。然而,传统的一致模型选择标准,如贝叶斯信息标准 (BIC),由于它们对模型空间的不适应性和穷举搜索的不可行性,可能是不够的。为了解决这两个问题,我们建立了一个通过信息准则选择最小真实模型的概率下界,在此基础上我们提出了一个模型选择准则,我们称之为 RIC c,适应模型空间。此外,我们开发了一种计算上可行的方法,将最小角度回归 (LAR) 的计算能力与 RIC c的计算能力相结合。理论和仿真研究都表明,如果 LAR 选择了最小的真实模型,则该方法可以识别出概率收敛为 1 的最小真实模型。所提出的方法应用于来自电力市场的真实数据,并且在价格预测准确性方面优于后向变量选择。版权所有 © 2010 Wiley Periodicals, Inc. 统计分析和数据挖掘 3: 350‐358, 2010
更新日期:2010-09-08
down
wechat
bug