当前位置: X-MOL 学术Biometrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Compositional knockoff filter for high-dimensional regression analysis of microbiome data
Biometrics ( IF 1.4 ) Pub Date : 2020-07-19 , DOI: 10.1111/biom.13336
Arun Srinivasan 1 , Lingzhou Xue 1 , Xiang Zhan 2
Affiliation  

A critical task in microbiome data analysis is to explore the association between a scalar response of interest and a large number of microbial taxa that are summarized as compositional data at different taxonomic levels. Motivated by fine-mapping of the microbiome, we propose a two-step compositional knockoff filter to provide the effective finite-sample false discovery rate (FDR) control in high-dimensional linear log-contrast regression analysis of microbiome compositional data. In the first step, we propose a new compositional screening procedure to remove insignificant microbial taxa while retaining the essential sum-to-zero constraint. In the second step, we extend the knockoff filter to identify the significant microbial taxa in the sparse regression model for compositional data. Thereby, a subset of the microbes is selected from the high-dimensional microbial taxa as related to the response under a prespecified FDR threshold. We study the theoretical properties of the proposed two-step procedure, including both sure screening and effective false discovery control. We demonstrate these properties in numerical simulation studies to compare our methods to some existing ones and show power gain of the new method while controlling the nominal FDR. The potential usefulness of the proposed method is also illustrated with application to an inflammatory bowel disease data set to identify microbial taxa that influence host gene expressions.

中文翻译:


用于微生物组数据高维回归分析的成分敲除滤波器



微生物组数据分析的一项关键任务是探索感兴趣的标量响应与大量微生物类群之间的关联,这些微生物类群被总结为不同分类水平的组成数据。受微生物组精细映射的启发,我们提出了一种两步成分敲除滤波器,以在微生物组成分数据的高维线性对数对比回归分析中提供有效的有限样本错误发现率(FDR)控制。第一步,我们提出了一种新的成分筛选程序,以去除无关紧要的微生物类群,同时保留基本的和为零的约束。在第二步中,我们扩展了敲除过滤器,以识别成分数据的稀疏回归模型中的重要微生物类群。因此,从高维微生物分类群中选择与预先指定的 FDR 阈值下的响应相关的微生物子集。我们研究了所提出的两步程序的理论特性,包括确定的筛选和有效的错误发现控制。我们在数值模拟研究中展示了这些特性,将我们的方法与一些现有方法进行比较,并显示新方法在控制标称 FDR 的同时的功率增益。该方法的潜在用途还通过应用于炎症性肠病数据集来识别影响宿主基因表达的微生物类群来说明。
更新日期:2020-07-19
down
wechat
bug