当前位置: X-MOL 学术J. Comput. Graph. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fast Cross-validation for Multi-penalty High-dimensional Ridge Regression
Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2021-05-19 , DOI: 10.1080/10618600.2021.1904962
Mark A. van de Wiel 1 , Mirrelijn M. van Nee 1 , Armin Rauschenberger 2
Affiliation  

Abstract

High-dimensional prediction with multiple data types needs to account for potentially strong differences in predictive signal. Ridge regression is a simple model for high-dimensional data that has challenged the predictive performance of many more complex models and learners, and that allows inclusion of data type-specific penalties. The largest challenge for multi-penalty ridge is to optimize these penalties efficiently in a cross-validation (CV) setting, in particular for GLM and Cox ridge regression, which require an additional estimation loop by iterative weighted least squares (IWLS). Our main contribution is a computationally very efficient formula for the multi-penalty, sample-weighted hat-matrix, as used in the IWLS algorithm. As a result, nearly all computations are in low-dimensional space, rendering a speed-up of several orders of magnitude. We developed a flexible framework that facilitates multiple types of response, unpenalized covariates, several performance criteria and repeated CV. Extensions to paired and preferential data types are included and illustrated on several cancer genomics survival prediction problems. Moreover, we present similar computational shortcuts for maximum marginal likelihood and Bayesian probit regression. The corresponding R-package, multiridge, serves as a versatile standalone tool, but also as a fast benchmark for other more complex models and multi-view learners. Supplementary materials for this article are available online.



中文翻译:

多惩罚高维岭回归的快速交叉验证

摘要

具有多种数据类型的高维预测需要考虑预测信号中潜在的强烈差异。岭回归是一种用于高维数据的简单模型,它挑战了许多更复杂模型和学习器的预测性能,并且允许包含特定于数据类型的惩罚。多重惩罚岭的最大挑战是在交叉验证 (CV) 设置中有效地优化这些惩罚,特别是对于 GLM 和 Cox 岭回归,它们需要通过迭代加权最小二乘法 (IWLS) 进行额外的估计循环。我们的主要贡献是在 IWLS 算法中使用的多惩罚、样本加权帽子矩阵的计算非常有效的公式。结果,几乎所有的计算都在低维空间中,呈现几个数量级的加速。我们开发了一个灵活的框架,可以促进多种类型的响应、无惩罚的协变量、多个性能标准和重复的 CV。在几个癌症基因组生存预测问题中包含并说明了对配对和优先数据类型的扩展。此外,我们为最大边际似然和贝叶斯概率回归提供了类似的计算捷径。相应的 R 包 multiridge 作为一个多功能的独立工具,但也作为其他更复杂模型和多视图学习器的快速基准。本文的补充材料可在线获取。在几个癌症基因组生存预测问题中包含并说明了对配对和优先数据类型的扩展。此外,我们为最大边际似然和贝叶斯概率回归提供了类似的计算捷径。相应的 R 包 multiridge 作为一个多功能的独立工具,但也作为其他更复杂模型和多视图学习器的快速基准。本文的补充材料可在线获取。在几个癌症基因组生存预测问题中包含并说明了对配对和优先数据类型的扩展。此外,我们为最大边际似然和贝叶斯概率回归提供了类似的计算捷径。相应的 R 包 multiridge 作为一个多功能的独立工具,但也作为其他更复杂模型和多视图学习器的快速基准。本文的补充材料可在线获取。但也可以作为其他更复杂模型和多视图学习器的快速基准。本文的补充材料可在线获取。但也可以作为其他更复杂模型和多视图学习器的快速基准。本文的补充材料可在线获取。

更新日期:2021-05-19
down
wechat
bug