当前位置: X-MOL 学术J. Am. Stat. Assoc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
From Fixed-X to Random-X Regression: Bias-Variance Decompositions, Covariance Penalties, and Prediction Error Estimation
Journal of the American Statistical Association ( IF 3.7 ) Pub Date : 2019-02-26 , DOI: 10.1080/01621459.2018.1424632
Saharon Rosset 1 , Ryan J. Tibshirani 2
Affiliation  

ABSTRACT In statistical prediction, classical approaches for model selection and model evaluation based on covariance penalties are still widely used. Most of the literature on this topic is based on what we call the “Fixed-X” assumption, where covariate values are assumed to be nonrandom. By contrast, it is often more reasonable to take a “Random-X” view, where the covariate values are independently drawn for both training and prediction. To study the applicability of covariance penalties in this setting, we propose a decomposition of Random-X prediction error in which the randomness in the covariates contributes to both the bias and variance components. This decomposition is general, but we concentrate on the fundamental case of ordinary least-squares (OLS) regression. We prove that in this setting the move from Fixed-X to Random-X prediction results in an increase in both bias and variance. When the covariates are normally distributed and the linear model is unbiased, all terms in this decomposition are explicitly computable, which yields an extension of Mallows’ Cp that we call RCp. RCp also holds asymptotically for certain classes of nonnormal covariates. When the noise variance is unknown, plugging in the usual unbiased estimate leads to an approach that we call , which is closely related to Sp, and generalized cross-validation (GCV). For excess bias, we propose an estimate based on the “shortcut-formula” for ordinary cross-validation (OCV), resulting in an approach we call RCp+. Theoretical arguments and numerical simulations suggest that RCp+ is typically superior to OCV, though the difference is small. We further examine the Random-X error of other popular estimators. The surprising result we get for ridge regression is that, in the heavily regularized regime, Random-X variance is smaller than Fixed-X variance, which can lead to smaller overall Random-X error. Supplementary materials for this article are available online.

中文翻译:

从固定 X 到随机 X 回归:偏差方差分解、协方差惩罚和预测误差估计

摘要 在统计预测中,基于协方差惩罚的模型选择和模型评估的经典方法仍然被广泛使用。大多数关于这个主题的文献都是基于我们所说的“固定 X”假设,其中协变量值被假定为非随机的。相比之下,采用“随机 X”的观点通常更合理,其中协变量值是为训练和预测独立绘制的。为了研究协方差惩罚在这种情况下的适用性,我们提出了 Random-X 预测误差的分解,其中协变量的随机性对偏差和方差分量都有贡献。这种分解是通用的,但我们专注于普通最小二乘 (OLS) 回归的基本情况。我们证明,在这种情况下,从 Fixed-X 到 Random-X 预测的转变会导致偏差和方差的增加。当协变量呈正态分布且线性模型无偏时,此分解中的所有项都是明确可计算的,这会产生我们称为 RCp 的 Mallows Cp 的扩展。对于某些类别的非正态协变量,RCp 也渐近成立。当噪声方差未知时,插入通常的无偏估计会导致我们称之为 的方法,它与 Sp 和广义交叉验证 (GCV) 密切相关。对于过度偏差,我们提出了基于普通交叉验证 (OCV) 的“捷径公式”的估计,从而产生了一种我们称之为 RCp+ 的方法。理论论证和数值模拟表明 RCp+ 通常优于 OCV,尽管差异很小。我们进一步检查了其他流行估计器的 Random-X 误差。我们对岭回归得到的令人惊讶的结果是,在高度正则化的情况下,Random-X 方差小于 Fixed-X 方差,这会导致更小的整体 Random-X 误差。本文的补充材料可在线获取。
更新日期:2019-02-26
down
wechat
bug