Variable Selection via Partial Correlation,Statistica Sinica

当前位置： X-MOL 学术 › Stat. Sin. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Variable Selection via Partial Correlation
Statistica Sinica ( IF 1.5 ) Pub Date : 2018-01-01 , DOI: 10.5705/ss.202015.0473
Runze Li ₁ , Jingyuan Liu ₁ , Lejia Lou ₁

Affiliation

Partial correlation based variable selection method was proposed for normal linear regression models by Bühlmann, Kalisch and Maathuis (2010) as a comparable alternative method to regularization methods for variable selection. This paper addresses two important issues related to partial correlation based variable selection method: (a) whether this method is sensitive to normality assumption, and (b) whether this method is valid when the dimension of predictor increases in an exponential rate of the sample size. To address issue (a), we systematically study this method for elliptical linear regression models. Our finding indicates that the original proposal may lead to inferior performance when the marginal kurtosis of predictor is not close to that of normal distribution. Our simulation results further confirm this finding. To ensure the superior performance of partial correlation based variable selection procedure, we propose a thresholded partial correlation (TPC) approach to select significant variables in linear regression models. We establish the selection consistency of the TPC in the presence of ultrahigh dimensional predictors. Since the TPC procedure includes the original proposal as a special case, our theoretical results address the issue (b) directly. As a by-product, the sure screening property of the first step of TPC was obtained. The numerical examples also illustrate that the TPC is competitively comparable to the commonly-used regularization methods for variable selection.

中文翻译：

通过偏相关的变量选择

Bühlmann、Kalisch 和 Maathuis (2010) 为正态线性回归模型提出了基于偏相关的变量选择方法，作为变量选择正则化方法的可比替代方法。本文解决了与基于偏相关的变量选择方法相关的两个重要问题：(a) 该方法是否对正态性假设敏感，以及 (b) 当预测变量的维数以样本量的指数速率增加时，该方法是否有效. 为了解决问题（a），我们系统地研究了椭圆线性回归模型的这种方法。我们的发现表明，当预测器的边际峰度不接近正态分布的边际峰度时，原始提议可能会导致较差的性能。我们的模拟结果进一步证实了这一发现。为了确保基于偏相关的变量选择程序的优越性能，我们提出了一种阈值偏相关 (TPC) 方法来选择线性回归模型中的重要变量。我们在存在超高维预测变量的情况下建立了 TPC 的选择一致性。由于 TPC 程序将原始提案作为一个特例，我们的理论结果直接解决了问题 (b)。作为副产品，获得了 TPC 第一步的可靠筛选性能。数值例子还说明 TPC 与常用的变量选择正则化方法具有竞争力。我们提出了一种阈值偏相关 (TPC) 方法来选择线性回归模型中的重要变量。我们在存在超高维预测变量的情况下建立了 TPC 的选择一致性。由于 TPC 程序将原始提案作为一个特例，我们的理论结果直接解决了问题 (b)。作为副产品，获得了 TPC 第一步的可靠筛选性能。数值例子还说明 TPC 与常用的变量选择正则化方法具有竞争力。我们提出了一种阈值偏相关 (TPC) 方法来选择线性回归模型中的重要变量。我们在存在超高维预测变量的情况下建立了 TPC 的选择一致性。由于 TPC 程序将原始提案作为一个特例，我们的理论结果直接解决了问题 (b)。作为副产品，获得了 TPC 第一步的可靠筛选性能。数值例子还说明了 TPC 与常用的变量选择正则化方法具有竞争力。获得了TPC第一步确定的筛选特性。数值例子还说明了 TPC 与常用的变量选择正则化方法具有竞争力。获得了TPC第一步确定的筛选特性。数值例子还说明了 TPC 与常用的变量选择正则化方法具有竞争力。

更新日期：2018-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11