A scalable estimate of the out‐of‐sample prediction error via approximate leave‐one‐out cross‐validation,The Journal of the Royal Statistical Society, Series B (Statistical Methodology)

当前位置： X-MOL 学术 › J. R. Stat. Soc. B › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A scalable estimate of the out‐of‐sample prediction error via approximate leave‐one‐out cross‐validation
The Journal of the Royal Statistical Society, Series B (Statistical Methodology) ( IF 5.8 ) Pub Date : 2020-06-20 , DOI: 10.1111/rssb.12374
Kamiar Rahnama Rad ₁ , Arian Maleki ₂

Affiliation

The paper considers the problem of out‐of‐sample risk estimation under the high dimensional settings where standard techniques such as K‐fold cross‐validation suffer from large biases. Motivated by the low bias of the leave‐one‐out cross‐validation method, we propose a computationally efficient closed form approximate leave‐one‐out formula ALO for a large class of regularized estimators. Given the regularized estimate, calculating ALO requires a minor computational overhead. With minor assumptions about the data‐generating process, we obtain a finite sample upper bound for the difference between leave‐one‐out cross‐validation and approximate leave‐one‐out cross‐validation, |LO−ALO|. Our theoretical analysis illustrates that |LO−ALO|→0 with overwhelming probability, when n ,p →∞, where the dimension p of the feature vectors may be comparable with or even greater than the number of observations, n . Despite the high dimensionality of the problem, our theoretical results do not require any sparsity assumption on the vector of regression coefficients. Our extensive numerical experiments show that |LO−ALO| decreases as n and p increase, revealing the excellent finite sample performance of approximate leave‐one‐out cross‐validation. We further illustrate the usefulness of our proposed out‐of‐sample risk estimation method by an example of real recordings from spatially sensitive neurons (grid cells) in the medial entorhinal cortex of a rat.

中文翻译：

通过近似留一法交叉验证对样本外预测误差进行可扩展的估计

本文考虑了在高维环境下进行样本外风险评估的问题，在这种情况下，诸如K折交叉验证之类的标准技术存在较大偏差。基于留一法交叉验证方法的低偏差，我们为一大类正则估计量提出了一种计算有效的封闭式近似留一法公式ALO。给定正规化的估计，计算ALO会需要较小的计算开销。在对数据生成过程进行较小假设的情况下，我们获得了留一单交叉验证与近似留一单交叉验证之间的差异的有限样本上限| LO-ALO |。我们的理论分析表明| LO-ALO |→0具有压倒性的概率，当n时，p →∞，其中特征向量的维数p可以与观察值n相比较甚至更大。尽管问题的维度很大，但我们的理论结果不需要对回归系数向量进行任何稀疏假设。我们广泛的数值实验表明| LO-ALO | 随着n和p的增加而减小，揭示了近似留一法交叉验证的出色的有限样本性能。我们以大鼠内侧内嗅皮层中空间敏感神经元（网格细胞）的真实记录为例，进一步说明了我们提出的样本外风险评估方法的实用性。

更新日期：2020-08-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>