当前位置: X-MOL 学术Environmetrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bayesian variable selection for high-dimensional rank data
Environmetrics ( IF 1.7 ) Pub Date : 2021-05-24 , DOI: 10.1002/env.2682
Can Cui 1 , Susheela P. Singh 1 , Ana‐Maria Staicu 1 , Brian J. Reich 1
Affiliation  

The study of microbiomes has become a topic of intense interest in last several decades as the development of new sequencing technologies has made DNA data accessible across disciplines. In this paper, we analyze a global dataset to investigate environmental factors that affect topsoil microbiome. As yet, much associated work has focused on linking indicators of microbial health to specific outcomes in various fields, rather than understanding how external factors may influence the microbiome composition itself. This is partially due to limited statistical methods to model abundance counts. The counts are high-dimensional, overdispersed, often zero-inflated, and exhibit complex dependence structures. Additionally, the raw counts are often noisy and compositional, and thus are not directly comparable across samples. Often, practitioners transform the counts to presence–absence indicators, but this transformation discards much of the data. As an alternative, we propose transforming to taxa ranks and develop a Bayesian variable selection model that uses ranks to identify covariates that influence microbiome composition. We show by simulation that the proposed model outperforms competitors across various settings and particular improvement in recall for small magnitude and low prevalence covariates. When applied to the topsoil data, the proposed method identifies several factors that affect microbiome composition.

中文翻译:

高维秩数据的贝叶斯变量选择

在过去的几十年里,随着新测序技术的发展使得 DNA 数据可以跨学科访问,微生物组的研究已经成为一个非常感兴趣的话题。在本文中,我们分析了一个全球数据集,以研究影响表土微生物组的环境因素。迄今为止,许多相关工作都集中在将微生物健康指标与各个领域的特定结果联系起来,而不是了解外部因素如何影响微生物组组成本身。这部分是由于有限的统计方法来模拟丰度计数。计数是高维的、过度分散的、通常是零膨胀的,并且表现出复杂的依赖结构。此外,原始计数通常包含噪声和成分,因此无法在样本之间直接进行比较。经常,从业者将计数转换为存在-不存在指标,但这种转换会丢弃大部分数据。作为替代方案,我们建议转换为分类群等级并开发贝叶斯变量选择模型,该模型使用等级来识别影响微生物组组成的协变量。我们通过模拟表明,所提出的模型在各种设置中都优于竞争对手,尤其是在小幅度和低流行协变量的召回方面的改进。当应用于表土数据时,所提出的方法确定了影响微生物组组成的几个因素。我们建议转换为分类群等级并开发贝叶斯变量选择模型,该模型使用等级来识别影响微生物组组成的协变量。我们通过模拟表明,所提出的模型在各种设置中都优于竞争对手,尤其是在小幅度和低流行协变量的召回方面的改进。当应用于表土数据时,所提出的方法确定了影响微生物组组成的几个因素。我们建议转换为分类群等级并开发贝叶斯变量选择模型,该模型使用等级来识别影响微生物组组成的协变量。我们通过模拟表明,所提出的模型在各种设置中都优于竞争对手,尤其是在小幅度和低流行协变量的召回方面的改进。当应用于表土数据时,所提出的方法确定了影响微生物组组成的几个因素。
更新日期:2021-05-24
down
wechat
bug