当前位置: X-MOL 学术J. Am. Stat. Assoc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Model-Free Feature Screening and FDR Control With Knockoff Features
Journal of the American Statistical Association ( IF 3.0 ) Pub Date : 2020-07-20 , DOI: 10.1080/01621459.2020.1783274
Wanjun Liu 1 , Yuan Ke 2 , Jingyuan Liu 3 , Runze Li 4
Affiliation  

Abstract

This article proposes a model-free and data-adaptive feature screening method for ultrahigh-dimensional data. The proposed method is based on the projection correlation which measures the dependence between two random vectors. This projection correlation based method does not require specifying a regression model, and applies to data in the presence of heavy tails and multivariate responses. It enjoys both sure screening and rank consistency properties under weak assumptions. A two-step approach, with the help of knockoff features, is advocated to specify the threshold for feature screening such that the false discovery rate (FDR) is controlled under a prespecified level. The proposed two-step approach enjoys both sure screening and FDR control simultaneously if the prespecified FDR level is greater or equal to 1/s, where s is the number of active features. The superior empirical performance of the proposed method is illustrated by simulation examples and real data applications. Supplementary materials for this article are available online.



中文翻译:

具有仿冒特征的无模型特征筛选和 FDR 控制

摘要

本文提出了一种针对超高维数据的无模型和数据自适应的特征筛选方法。所提出的方法基于投影相关性,该投影相关性测量两个随机向量之间的相关性。这种基于投影相关性的方法不需要指定回归模型,并且适用于存在重尾和多变量响应的数据。在弱假设下,它同时具有确定筛选和​​等级一致性属性。提倡使用仿冒特征的两步方法来指定特征筛选的阈值,以便将错误发现率(FDR)控制在预先指定的水平下。如果预先指定的 FDR 水平大于或等于 1/ s,则建议的两步方法同时享有确定筛选和​​ FDR 控制,其中s是活动特征的数量。仿真示例和实际数据应用说明了所提出方法的优越经验性能。本文的补充材料可在线获取。

更新日期:2020-07-20
down
wechat
bug