当前位置: X-MOL 学术J. Comput. Graph. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Adaptive Handling of Dependence in High-Dimensional Regression Modeling
Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2022-06-23 , DOI: 10.1080/10618600.2022.2076687
Florian Hébert 1 , David Causeur 1 , Mathieu Emily 1
Affiliation  

Abstract

Dependence within a high-dimensional profile of explanatory variables affects estimation and prediction performance of regression models. However, the strong belief that dependence should not be ignored, based on our well-proven knowledge of low-dimensional regression modeling, is not necessarily true in high dimension. To investigate this point, we introduce a new class of prediction scores defined as linear combinations of a same random vector, including the naive prediction score obtained when ignoring dependence and the Ordinary Least Squares (OLS) prediction score that, on the contrary, fully accounts for dependence by a preliminary whitening of the explanatory variables. Interestingly, the former class also contains Ridge and Partial Least Squares prediction scores, that both offer intermediate ways of dealing with dependence. Through a theoretical comparative study, it is first shown how the best handling of dependence should depend on the interplay between the structure of conditional dependence across explanatory variables and the pattern of the association signal. We also derive the closed form expression of the prediction score with best prediction performance within the proposed class, leading to an adaptive handling of dependence. Finally, it is demonstrated through simulation studies and using benchmark datasets that this prediction score outperforms existing methods in various settings. Supplementary materials for this article are available online.



中文翻译:

高维回归建模中依赖的自适应处理

摘要

解释变量的高维轮廓内的依赖性影响回归模型的估计和预测性能。然而,基于我们对低维回归建模的充分证明的知识,不应忽视依赖性的强烈信念在高维中不一定正确。为了研究这一点,我们引入了一类新的预测分数,定义为同一随机向量的线性组合,包括忽略依赖性时获得的朴素预测分数和相反,完全考虑的普通最小二乘法 (OLS) 预测分数通过解释变量的初步白化来依赖。有趣的是,前一类还包含 Ridge 和偏最小二乘预测分数,它们都提供了处理依赖性的中间方法。通过理论比较研究,首先表明依赖性的最佳处理应如何取决于跨解释变量的条件依赖结构与关联信号模式之间的相互作用。我们还导出了在建议的类别中具有最佳预测性能的预测分数的封闭形式表达式,从而导致依赖性的自适应处理。最后,通过模拟研究和使用基准数据集证明,该预测分数在各种设置中优于现有方法。本文的补充材料可在线获取。首先展示了依赖性的最佳处理应该如何取决于跨解释变量的条件依赖性结构与关联信号模式之间的相互作用。我们还导出了在建议的类别中具有最佳预测性能的预测分数的封闭形式表达式,从而导致依赖性的自适应处理。最后,通过模拟研究和使用基准数据集证明,该预测分数在各种设置中优于现有方法。本文的补充材料可在线获取。首先展示了依赖性的最佳处理应该如何取决于跨解释变量的条件依赖性结构与关联信号模式之间的相互作用。我们还导出了在建议的类别中具有最佳预测性能的预测分数的封闭形式表达式,从而导致依赖性的自适应处理。最后,通过模拟研究和使用基准数据集证明,该预测分数在各种设置中优于现有方法。本文的补充材料可在线获取。通过模拟研究和使用基准数据集证明,该预测分数在各种设置中优于现有方法。本文的补充材料可在线获取。通过模拟研究和使用基准数据集证明,该预测分数在各种设置中优于现有方法。本文的补充材料可在线获取。

更新日期:2022-06-23
down
wechat
bug