High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking.,Statistics and Computing

当前位置： X-MOL 学术 › Stat. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking.
Statistics and Computing ( IF 2.2 ) Pub Date : 2019-12-19 , DOI: 10.1007/s11222-019-09914-9
Fan Wang ₁ , Sach Mukherjee ₂ , Sylvia Richardson ₁ , Steven M Hill ₁

Affiliation

Penalized likelihood approaches are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well developed, the relative efficacy of different approaches in finite-sample settings, as encountered in practice, remains incompletely understood. There is therefore a need for empirical investigations in this area that can offer practical insight and guidance to users. In this paper, we present a large-scale comparison of penalized regression methods. We distinguish between three related goals: prediction, variable selection and variable ranking. Our results span more than 2300 data-generating scenarios, including both synthetic and semisynthetic data (real covariates and simulated responses), allowing us to systematically consider the influence of various factors (sample size, dimensionality, sparsity, signal strength and multicollinearity). We consider several widely used approaches (Lasso, Adaptive Lasso, Elastic Net, Ridge Regression, SCAD, the Dantzig Selector and Stability Selection). We find considerable variation in performance between methods. Our results support a “no panacea” view, with no unambiguous winner across all scenarios or goals, even in this restricted setting where all data align well with the assumptions underlying the methods. The study allows us to make some recommendations as to which approaches may be most (or least) suitable given the goal and some data characteristics. Our empirical results complement existing theory and provide a resource to compare methods across a range of scenarios and metrics.

中文翻译：

实际中的高维回归：有限样本预测，变量选择和排名的实证研究。

惩罚似然法被广泛用于高维回归。尽管已经提出了许多方法并且相关理论已经得到很好的发展，但在实践中遇到的有限样本设置中不同方法的相对功效仍未完全理解。因此，需要在这一领域进行经验研究，以便为用户提供实用的见识和指导。在本文中，我们提出了惩罚回归方法的大规模比较。我们区分三个相关目标：预测，变量选择和变量排名。我们的结果涵盖了2300多种数据生成场景，包括合成数据和半合成数据（实际协变量和模拟响应），使我们能够系统地考虑各种因素（样本量，维度，稀疏度，信号强度和多重共线性）。我们考虑了几种广泛使用的方法（套索，自适应套索，弹性网，岭回归，SCAD，Dantzig选择器和稳定性选择）。我们发现方法之间的性能差异很大。我们的结果支持“没有万能药”的观点，即使在所有数据都与方法所依据的假设完全吻合的受限环境中，也没有在所有场景或目标中获得明确的胜利者。这项研究允许我们针对目标和某些数据特征，针对哪种方法最（或最不适合）提出一些建议。我们的经验结果对现有理论进行了补充，并提供了一种资源来比较各种方案和指标之间的方法。我们考虑了几种广泛使用的方法（套索，自适应套索，弹性网，岭回归，SCAD，Dantzig选择器和稳定性选择）。我们发现方法之间的性能差异很大。我们的结果支持“没有万能药”的观点，即使在所有数据都与方法所依据的假设完全吻合的受限环境中，也没有在所有场景或目标中获得明确的胜利者。这项研究允许我们针对目标和某些数据特征，针对哪种方法最（或最不适合）提出一些建议。我们的经验结果对现有理论进行了补充，并提供了一种资源来比较各种方案和指标之间的方法。我们考虑了几种广泛使用的方法（套索，自适应套索，弹性网，岭回归，SCAD，Dantzig选择器和稳定性选择）。我们发现方法之间的性能差异很大。我们的研究结果支持“没有万能药”的观点，即使在所有数据都与方法所依据的假设完全吻合的受限环境中，也没有在所有场景或目标中获得明确的胜利者。这项研究允许我们针对目标和某些数据特征，针对哪种方法最（或最不适合）提出一些建议。我们的经验结果对现有理论进行了补充，并提供了一种资源来比较各种方案和指标之间的方法。我们发现方法之间的性能差异很大。我们的结果支持“没有万能药”的观点，即使在所有数据都与方法所依据的假设完全吻合的受限环境中，也没有在所有场景或目标中获得明确的胜利者。这项研究允许我们针对目标和某些数据特征，针对哪种方法最（或最不适合）提出一些建议。我们的经验结果对现有理论进行了补充，并提供了一种资源来比较各种方案和指标之间的方法。我们发现方法之间的性能差异很大。我们的结果支持“没有万能药”的观点，即使在所有数据都与方法所依据的假设完全吻合的受限环境中，也没有在所有场景或目标中获得明确的胜利者。这项研究允许我们针对目标和某些数据特征，针对哪种方法最（或最不适合）提出一些建议。我们的经验结果对现有理论进行了补充，并提供了一种资源来比较各种场景和指标之间的方法。这项研究允许我们针对目标和某些数据特征，针对哪种方法最（或最不适合）提出一些建议。我们的经验结果对现有理论进行了补充，并提供了一种资源来比较各种方案和指标之间的方法。这项研究允许我们针对目标和某些数据特征，针对哪种方法最（或最不适合）提出一些建议。我们的经验结果对现有理论进行了补充，并提供了一种资源来比较各种方案和指标之间的方法。

更新日期：2019-12-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>