A Robust Model-Free Feature Screening Method for Ultrahigh-Dimensional Data,Journal of Computational and Graphical Statistics

当前位置： X-MOL 学术 › J. Comput. Graph. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Robust Model-Free Feature Screening Method for Ultrahigh-Dimensional Data
Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2017-10-02 , DOI: 10.1080/10618600.2017.1328364
Jingnan Xue ₁ , Faming Liang ₂

Affiliation

ABSTRACT Feature screening plays an important role in dimension reduction for ultrahigh-dimensional data. In this article, we introduce a new feature screening method and establish its sure independence screening property under the ultrahigh-dimensional setting. The proposed method works based on the nonparanormal transformation and Henze–Zirkler’s test, that is, it first transforms the response variable and features to Gaussian random variables using the nonparanormal transformation and then tests the dependence between the response variable and features using the Henze–Zirkler’s test. The proposed method enjoys at least two merits. First, it is model-free, which avoids the specification of a particular model structure. Second, it is condition-free, which does not require any extra conditions except for some regularity conditions for high-dimensional feature screening. The numerical results indicate that, compared to the existing methods, the proposed method is more robust to the data generated from heavy-tailed distributions and/or complex models with interaction variables. The proposed method is applied to screening of anticancer drug response genes. Supplementary material for this article is available online.

中文翻译：

一种鲁棒的超高维数据无模型特征筛选方法

摘要特征筛选在超高维数据降维中发挥着重要作用。在本文中，我们介绍了一种新的特征筛选方法，并在超高维设置下建立了其可靠的独立筛选特性。该方法基于非超常变换和Henze-Zirkler检验，即首先使用非超常变换将响应变量和特征变换为高斯随机变量，然后使用Henze-Zirkler检验检验响应变量和特征之间的相关性。测试。所提出的方法至少有两个优点。首先，它是无模型的，这避免了特定模型结构的规范。其次，它是无条件的，除了一些用于高维特征筛选的正则性条件外，不需要任何额外的条件。数值结果表明，与现有方法相比，所提出的方法对于重尾分布和/或具有交互变量的复杂模型生成的数据更加稳健。该方法应用于抗癌药物反应基因的筛选。本文的补充材料可在线获取。

更新日期：2017-10-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11