RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs,Journal of the American Statistical Association

当前位置： X-MOL 学术 › J. Am. Stat. Assoc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs
Journal of the American Statistical Association ( IF 3.0 ) Pub Date : 2019-04-11 , DOI: 10.1080/01621459.2018.1546589
Yingying Fan ₁ , Emre Demirkaya ₁ , Gaorong Li ₂ , Jinchi Lv ₁

Affiliation

Abstract Power and reproducibility are key to enabling refined scientific discoveries in contemporary big data applications with general high-dimensional nonlinear models. In this article, we provide theoretical foundations on the power and robustness for the model-X knockoffs procedure introduced recently in Candès, Fan, Janson and Lv in high-dimensional setting when the covariate distribution is characterized by Gaussian graphical model. We establish that under mild regularity conditions, the power of the oracle knockoffs procedure with known covariate distribution in high-dimensional linear models is asymptotically one as sample size goes to infinity. When moving away from the ideal case, we suggest the modified model-X knockoffs method called graphical nonlinear knockoffs (RANK) to accommodate the unknown covariate distribution. We provide theoretical justifications on the robustness of our modified procedure by showing that the false discovery rate (FDR) is asymptotically controlled at the target level and the power is asymptotically one with the estimated covariate distribution. To the best of our knowledge, this is the first formal theoretical result on the power for the knockoffs procedure. Simulation results demonstrate that compared to existing approaches, our method performs competitively in both FDR control and power. A real dataset is analyzed to further assess the performance of the suggested knockoffs procedure. Supplementary materials for this article are available online.

中文翻译：

RANK：具有图形非线性仿制品的大规模推理

摘要：功效和可重复性是利用通用高维非线性模型在当代大数据应用中实现精细科学发现的关键。在本文中，当协变量分布由高斯图模型表征时，我们为 Candès、Fan、Janson 和 Lv 最近在高维环境中引入的 model-X 仿制程序的功效和鲁棒性提供了理论基础。我们确定，在温和的规律性条件下，当样本量趋于无穷大时，高维线性模型中已知协变量分布的预言机仿冒程序的功效是渐近的。当偏离理想情况时，我们建议使用修改后的 model-X 仿制方法，称为图形非线性仿制 (RANK)，以适应未知的协变量分布。我们通过证明错误发现率（FDR）渐近地控制在目标水平并且功效与估计的协变量分布渐近一致，为我们修改后的程序的稳健性提供了理论依据。据我们所知，这是关于仿冒程序功效的第一个正式理论结果。仿真结果表明，与现有方法相比，我们的方法在 FDR 控制和功率方面都具有竞争力。分析真实数据集以进一步评估建议的仿冒程序的性能。本文的补充材料可在线获取。

更新日期：2019-04-11

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11