Model-Free, Monotone Invariant and Computationally Efficient Feature Screening with Data-adaptive Threshold,arXiv - STAT - Other Statistics

当前位置： X-MOL 学术 › arXiv.stat.OT › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Model-Free, Monotone Invariant and Computationally Efficient Feature Screening with Data-adaptive Threshold
arXiv - STAT - Other Statistics Pub Date : 2022-07-27 , DOI: arxiv-2207.13522
Linsui Deng, Yilin Zhang

Feature screening for ultrahigh-dimension, in general, proceeds with two essential steps. The first step is measuring and ranking the marginal dependence between response and covariates, and the second is determining the threshold. We develop a new screening procedure, called SIT-BY procedure, that possesses appealing statistical properties in both steps. By employing sliced independence estimates in the measuring and ranking stage, our proposed procedure requires no model assumptions, remains invariant to monotone transformation, and achieves almost linear computation complexity. Inspired by false discovery rate (FDR) control procedures, we offer a data-adaptive threshold benefit from the asymptotic normality of test statistics. Under moderate conditions, we demonstrate that our procedure can asymptotically control the FDR while maintaining the sure screening property. We investigate the finite sample performance of our proposed procedure via extensive simulations and an application to genome-wide dataset.

中文翻译：

具有数据自适应阈值的无模型、单调不变和计算高效的特征筛选

通常，超高维特征筛选需要两个基本步骤。第一步是测量和排序响应和协变量之间的边际依赖性，第二步是确定阈值。我们开发了一种新的筛选程序，称为 SIT-BY 程序，它在两个步骤中都具有吸引人的统计特性。通过在测量和排序阶段采用切片独立估计，我们提出的过程不需要模型假设，对单调变换保持不变，并实现几乎线性的计算复杂度。受错误发现率 (FDR) 控制程序的启发，我们从测试统计的渐近正态性中提供了数据自适应阈值优势。在温和的条件下，我们证明了我们的程序可以在保持确定的筛选属性的同时渐近地控制 FDR。我们通过广泛的模拟和对全基因组数据集的应用来研究我们提出的程序的有限样本性能。

更新日期：2022-07-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文