Covariate Adaptive False Discovery Rate Control With Applications to Omics-Wide Multiple Testing,Journal of the American Statistical Association

当前位置： X-MOL 学术 › J. Am. Stat. Assoc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Covariate Adaptive False Discovery Rate Control With Applications to Omics-Wide Multiple Testing
Journal of the American Statistical Association ( IF 3.0 ) Pub Date : 2020-08-17 , DOI: 10.1080/01621459.2020.1783273
Xianyang Zhang ₁ , Jun Chen ₂

Affiliation

Abstract

Conventional multiple testing procedures often assume hypotheses for different features are exchangeable. However, in many scientific applications, additional covariate information regarding the patterns of signals and nulls are available. In this article, we introduce an FDR control procedure in large-scale inference problem that can incorporate covariate information. We develop a fast algorithm to implement the proposed procedure and prove its asymptotic validity even when the underlying likelihood ratio model is misspecified and the p-values are weakly dependent (e.g., strong mixing). Extensive simulations are conducted to study the finite sample performance of the proposed method and we demonstrate that the new approach improves over the state-of-the-art approaches by being flexible, robust, powerful, and computationally efficient. We finally apply the method to several omics datasets arising from genomics studies with the aim to identify omics features associated with some clinical and biological phenotypes. We show that the method is overall the most powerful among competing methods, especially when the signal is sparse. The proposed covariate adaptive multiple testing procedure is implemented in the R package CAMT. Supplementary materials for this article are available online.

中文翻译：

协变量自适应错误发现率控制与应用于组学范围的多重测试

摘要

传统的多重测试程序通常假设不同特征的假设是可交换的。然而，在许多科学应用中，关于信号和零点模式的附加协变量信息是可用的。在本文中，我们介绍了一种可以包含协变量信息的大规模推理问题中的 FDR 控制过程。我们开发了一种快速算法来实现所提出的过程，并证明其渐近有效性，即使基础似然比模型被错误指定并且p-值是弱相关的（例如，强混合）。进行了广泛的模拟以研究所提出方法的有限样本性能，我们证明了新方法通过灵活、稳健、强大和计算效率提高了最先进的方法。我们最终将该方法应用于基因组学研究产生的几个组学数据集，旨在识别与一些临床和生物学表型相关的组学特征。我们表明，该方法总体上是竞争方法中最强大的，尤其是在信号稀疏的情况下。所提出的协变量自适应多重测试程序在 R 包 CAMT 中实现。本文的补充材料可在线获取。

更新日期：2020-08-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11