当前位置: X-MOL 学术Microbiome › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Clustering co-abundant genes identifies components of the gut microbiome that are reproducibly associated with colorectal cancer and inflammatory bowel disease.
Microbiome ( IF 15.5 ) Pub Date : 2019-08-01 , DOI: 10.1186/s40168-019-0722-6
Samuel S Minot 1 , Amy D Willis 2
Affiliation  

BACKGROUND Whole-genome "shotgun" (WGS) metagenomic sequencing is an increasingly widely used tool for analyzing the metagenomic content of microbiome samples. While WGS data contains gene-level information, it can be challenging to analyze the millions of microbial genes which are typically found in microbiome experiments. To mitigate the ultrahigh dimensionality challenge of gene-level metagenomics, it has been proposed to cluster genes by co-abundance to form Co-Abundant Gene groups (CAGs). However, exhaustive co-abundance clustering of millions of microbial genes across thousands of biological samples has previously been intractable purely due to the computational challenge of performing trillions of pairwise comparisons. RESULTS Here we present a novel computational approach to the analysis of WGS datasets in which microbial gene groups are the fundamental unit of analysis. We use the Approximate Nearest Neighbor heuristic for near-exhaustive average linkage clustering to group millions of genes by co-abundance. This results in thousands of high-quality CAGs representing complete and partial microbial genomes. We applied this method to publicly available WGS microbiome surveys and found that the resulting microbial CAGs associated with inflammatory bowel disease (IBD) and colorectal cancer (CRC) were highly reproducible and could be validated independently using multiple independent cohorts. CONCLUSIONS This powerful approach to gene-level metagenomics provides a powerful path forward for identifying the biological links between the microbiome and human health. By proposing a new computational approach for handling high dimensional metagenomics data, we identified specific microbial gene groups that are associated with disease that can be used to identify strains of interest for further preclinical and mechanistic experimentation.

中文翻译:

聚簇共富集基因鉴定了肠道微生物组中与结肠直肠癌和炎症性肠病可重复相关的成分。

背景技术全基因组“ shot弹枪”(WGS)宏基因组测序是一种越来越广泛地用于分析微生物组样品宏基因组含量的工具。尽管WGS数据包含基因水平的信息,但要分析通常在微生物组实验中发现的数百万个微生物基因可能具有挑战性。为了缓解基因水平宏基因组学的超高维挑战,已提出通过共丰度对基因进行聚类以形成共丰度基因组(CAG)。但是,纯粹是由于执行数万亿次成对比较的计算挑战,以前难于在数以千计的生物样本中对数百万个微生物基因进行详尽的共聚聚类。结果在这里,我们提出了一种新的计算方法来分析WGS数据集,其中微生物基因组是分析的基本单位。我们对近似穷举的平均连锁聚类使用“近似最近邻”启发式算法,通过共丰度将数百万个基因分组。这导致成千上万个代表完整和部分微生物基因组的高质量CAG。我们将这种方法应用于可公开获得的WGS微生物组调查中,发现与炎症性肠病(IBD)和结直肠癌(CRC)相关的微生物CAG具有很高的重现性,可以使用多个独立的队列进行独立验证。结论这种强有力的基因水平宏基因组学方法为鉴定微生物组与人类健康之间的生物学联系提供了一条强有力的途径。
更新日期:2019-08-01
down
wechat
bug