当前位置: X-MOL 学术Methods Ecol. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Model-based biclustering for overdispersed count data with application in microbial ecology
Methods in Ecology and Evolution ( IF 6.3 ) Pub Date : 2021-02-26 , DOI: 10.1111/2041-210x.13582
Julie Aubert 1 , Sophie Schbath 2 , Stéphane Robin 1
Affiliation  

  1. Different studies have shown that microbial communities living in animals (humans included), in or around plants have a significant impact on health and disease of their host and on various services, such as adaptation under stressing environment. The basic input data to study microbiomes is a matrix representing abundance data of micro-organisms across different sampling units. Such a matrix typically corresponds to taxonomic profiles derived from the high-throughput sequencing of environmental samples. Biclustering is one way to study the interactions between the structure of micro-organism communities and the environmental samples they come from.
  2. We propose a latent block model (LBM) and an associated inference procedure for the biclustering of rows and columns of abundance matrices. The LBM assumes that micro-organisms (rows) and environmental samples (columns) can both be clustered into groups characterizing preferential interaction or avoidance. We use the Poisson–Gamma distribution to model the overdispersion observed in microbial abundance data and introduce row and column effects to account for the sequencing effort in each sample and the mean abundance of each micro-organism. Because the latent variables are not independent conditionally on the observed ones, classical maximum likelihood inference is intractable. We then derive a variational-based inference algorithm and propose a strategy to select the number of biclusters.
  3. We illustrate the flexibility and performance of our approach both on a simulation study and on three ecological datasets. The model-based framework allows us to adapt to peculiarities of microbial ecological abundance data and allows us to explore relationships between entities of two different natures.
  4. We implemented our method in the cobiclust R package available on the CRAN and built a website with example of usage (https://julieaubert.github.io/cobiclust/cobiclust-example1.html).


中文翻译:

基于模型的双聚类过度分散计数数据在微生物生态学中的应用

  1. 不同的研究表明,生活在动物(包括人类)、植物内部或周围的微生物群落对其宿主的健康和疾病以及各种服务(例如在压力环境下的适应)具有重大影响。研究微生物组的基本输入数据是一个矩阵,表示跨不同采样单元的微生物丰度数据。这样的矩阵通常对应于从环境样本的高通量测序得出的分类学概况。双聚类是研究微生物群落结构与其来源环境样本之间相互作用的一种方法。
  2. 我们提出了一个潜在块模型 (LBM) 和相关的推理过程,用于丰度矩阵的行和列的双聚类。LBM 假设微生物(行)和环境样本(列)都可以聚类为表征优先相互作用或回避的组。我们使用 Poisson-Gamma 分布对微生物丰度数据中观察到的过度分散进行建模,并引入行和列效应来解释每个样本中的测序工作和每个微生物的平均丰度。因为潜在变量不是独立于观察到的变量,经典的最大似然推断是难以处理的。然后,我们推导出一种基于变分的推理算法,并提出了一种选择双聚类数量的策略。
  3. 我们在模拟研究和三个生态数据集上说明了我们方法的灵活性和性能。基于模型的框架使我们能够适应微生物生态丰度数据的特性,并允许我们探索两种不同性质的实体之间的关系。
  4. 我们在 CRAN 上可用的cobiclust R包中实现了我们的方法,并构建了一个包含使用示例的网站 (https://julieaubert.github.io/cobiclust/cobiclust-example1.html)。
更新日期:2021-02-26
down
wechat
bug