Simultaneous high-probability bounds on the false discovery proportion in structured, regression and online settings,Annals of Statistics

当前位置： X-MOL 学术 › Ann. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Simultaneous high-probability bounds on the false discovery proportion in structured, regression and online settings
Annals of Statistics ( IF 3.2 ) Pub Date : 2020-12-01 , DOI: 10.1214/19-aos1938
Eugene Katsevich , Aaditya Ramdas

While traditional multiple testing procedures prohibit adaptive analysis choices made by users, Goeman and Solari (2011) proposed a simultaneous inference framework that allows users such flexibility while preserving high-probability bounds on the false discovery proportion (FDP) of the chosen set. In this paper, we propose a new class of such simultaneous FDP bounds, tailored for nested sequences of rejection sets. While most existing simultaneous FDP bounds are based on closed testing using global null tests based on sorted p-values, we additionally consider the setting where side information can be leveraged to boost power, the variable selection setting where knockoff statistics can be used to order variables, and the online setting where decisions about rejections must be made as data arrives. Our finite-sample, closed form bounds are based on repurposing the FDP estimates from false discovery rate (FDR) controlling procedures designed for each of the above settings. These results establish a novel connection between the parallel literatures of simultaneous FDP bounds and FDR control methods, and use proof techniques employing martingales and filtrations that are new to both these literatures. We demonstrate the utility of our results by augmenting a recent knockoffs analysis of the UK Biobank dataset.

中文翻译：

结构化、回归和在线设置中错误发现比例的同时高概率界限

虽然传统的多重测试程序禁止用户做出自适应分析选择，但 Goeman 和 Solari (2011) 提出了一种同时推理框架，该框架允许用户具有这种灵活性，同时保留所选集合的错误发现比例 (FDP) 的高概率界限。在本文中，我们提出了一类新的此类同时 FDP 边界，专为拒绝集的嵌套序列量身定制。虽然大多数现有的同步 FDP 边界都是基于使用基于排序 p 值的全局空测试的封闭测试，但我们还考虑了可以利用边信息来提高功率的设置，变量选择设置，其中可以使用仿制统计数据对变量进行排序，以及必须在数据到达时做出拒绝决定的在线设置。我们的有限样本，封闭形式边界基于为上述每个设置设计的错误发现率 (FDR) 控制程序重新利用 FDP 估计值。这些结果在同时 FDP 边界和 FDR 控制方法的平行文献之间建立了新的联系，并使用了采用这些文献的新的鞅和过滤的证明技术。我们通过增强最近对英国生物银行数据集的仿制分析来证明我们的结果的实用性。

更新日期：2020-12-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文