Variable Selection With Second-Generation P-Values,The American Statistician

当前位置： X-MOL 学术 › Am. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Variable Selection With Second-Generation P-Values
The American Statistician ( IF 1.8 ) Pub Date : 2021-07-26 , DOI: 10.1080/00031305.2021.1946150
Yi Zuo ₁ , Thomas G. Stewart ₁ , Jeffrey D. Blume ₁

Affiliation

Abstract

Many statistical methods have been proposed for variable selection in the past century, but few balance inference and prediction tasks well. Here, we report on a novel variable selection approach called penalized regression with second-generation p-values (ProSGPV). It captures the true model at the best rate achieved by current standards, is easy to implement in practice, and often yields the smallest parameter estimation error. The idea is to use an $l_{0}$ penalization scheme with second-generation p-values (SGPV), instead of traditional ones, to determine which variables remain in a model. The approach yields tangible advantages for balancing support recovery, parameter estimation, and prediction tasks. The ProSGPV algorithm can maintain its good performance even when there is strong collinearity among features or when a high-dimensional feature space with p > n is considered. We present extensive simulations and a real-world application comparing the ProSGPV approach with smoothly clipped absolute deviation (SCAD), adaptive lasso (AL), and minimax concave penalty with penalized linear unbiased selection (MC+). While the last three algorithms are among the current standards for variable selection, ProSGPV has superior inference performance and comparable prediction performance in certain scenarios.

中文翻译：

使用第二代 P 值进行变量选择

摘要

在过去的一个世纪中，已经提出了许多用于变量选择的统计方法，但很少能很好地平衡推理和预测任务。在这里，我们报告了一种新的变量选择方法，称为具有第二代 p 值的惩罚回归 (ProSGPV)。它以当前标准达到的最佳速率捕获真实模型，在实践中易于实施，并且通常会产生最小的参数估计误差。这个想法是使用一个 $l_{0}$ 使用第二代 p 值 (SGPV) 而不是传统的惩罚方案来确定模型中保留哪些变量。该方法在平衡支持恢复、参数估计和预测任务方面产生了切实的优势。ProSGPV算法即使在特征之间存在强共线性或考虑p>n的高维特征空间时也能保持良好的性能。我们提供了广泛的模拟和实际应用，比较了 ProSGPV 方法与平滑剪裁绝对偏差 (SCAD)、自适应套索 (AL) 和带有惩罚线性无偏选择 (MC+) 的极小极大凹面惩罚。虽然最后三种算法属于当前的变量选择标准，

更新日期：2021-07-26

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>