当前位置: X-MOL 学术Biol. Direct › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Assessment of urban microbiome assemblies with the help of targeted in silico gold standards.
Biology Direct ( IF 5.5 ) Pub Date : 2018-10-12 , DOI: 10.1186/s13062-018-0225-6
Samuel M Gerner 1, 2 , Thomas Rattei 2 , Alexandra B Graf 1
Affiliation  

BACKGROUND Microbial communities play a crucial role in our environment and may influence human health tremendously. Despite being the place where human interaction is most abundant we still know little about the urban microbiome. This is highlighted by the large amount of unclassified DNA reads found in urban metagenome samples. The only in silico approach that allows us to find unknown species, is the assembly and classification of draft genomes from a metagenomic dataset. In this study we (1) investigate the applicability of an assembly and binning approach for urban metagenome datasets, and (2) develop a new method for the generation of in silico gold standards to better understand the specific challenges of such datasets and provide a guide in the selection of available software. RESULTS We applied combinations of three assembly (Megahit, SPAdes and MetaSPAdes) and three binning tools (MaxBin, MetaBAT and CONCOCT) to whole genome shotgun datasets from the CAMDA 2017 Challenge. Complex in silico gold standards with a simulated bacterial fraction were generated for representative samples of each surface type and city. Using these gold standards, we found the combination of SPAdes and MetaBAT to be optimal for urban metagenome datasets by providing the best trade-off between the number of high-quality genome draft bins (MIMAG standards) retrieved, the least amount of misassemblies and contamination. The assembled draft genomes included known species like Propionibacterium acnes but also novel species according to respective ANI values. CONCLUSIONS In our work, we showed that, even for datasets with high diversity and low sequencing depth from urban environments, assembly and binning-based methods can provide high-quality genome drafts. Of vital importance to retrieve high-quality genome drafts is sequence depth but even more so a high proportion of the bacterial sequence fraction too achieve high coverage for bacterial genomes. In contrast to read-based methods relying on database knowledge, genome-centric methods as applied in this study can provide valuable information about unknown species and strains as well as functional contributions of single community members within a sample. Furthermore, we present a method for the generation of sample-specific highly complex in silico gold standards. REVIEWERS This article was reviewed by Craig Herbold, Serghei Mangul and Yana Bromberg.

中文翻译:

在有针对性的计算机黄金标准的帮助下评估城市微生物组。

背景技术微生物群落在我们的环境中起着至关重要的作用,并且可能极大地影响人类健康。尽管这里是人类互动最丰富的地方,但我们对城市微生物组仍然知之甚少。城市元基因组样本中发现了大量未分类的DNA读数,这突显了这一点。唯一允许我们找到未知物种的计算机方法是从宏基因组数据集中对草图基因组进行组装和分类。在这项研究中,我们(1)研究城市元基因组数据集的组装和分箱方法的适用性,以及(2)开发一种生成计算机黄金标准的新方法,以更好地理解此类数据集的具体挑战并提供指导在选择可用软件中。结果我们应用了三个程序集的组合(Megahit,SPAdes和MetaSPAdes)以及三个装箱工具(MaxBin,MetaBAT和CONCOCT)来捕获CAMDA 2017挑战赛的全基因组shot弹枪数据集。对于每种表面类型和城市的代表性样本,生成了带有模拟细菌分数的复杂的计算机模拟金标准。使用这些黄金标准,我们发现SPAdes和MetaBAT的组合对于城市元基因组数据集是最佳的,方法是在检索到的高质量基因组草图收集箱(MIMAG标准)数量,最小的错配和污染数量之间取得最佳平衡。组装的基因组草图包括痤疮丙酸杆菌等已知物种,但根据各自的ANI值也包括新物种。结论在我们的工作中,我们表明,即使对于城市环境中具有高多样性和低测序深度的数据集,基于装配和分箱的方法可以提供高质量的基因组草图。检索高质量基因组草图至关重要的是序列深度,但更重要的是,很大比例的细菌序列部分也能实现细菌基因组的高覆盖率。与依赖数据库知识的基于读取的方法相反,本研究中应用的以基因组为中心的方法可以提供有关未知物种和菌株以及样本中单个社区成员的功能性贡献的有价值的信息。此外,我们提出了一种生成特定样品的高度复杂的计算机黄金标准的方法。审阅者本文由Craig Herbold,Serghei Mangul和Yana Bromberg审阅。检索高质量基因组草图至关重要的是序列深度,但更重要的是,很大比例的细菌序列部分也能实现细菌基因组的高覆盖率。与依赖数据库知识的基于读取的方法相反,本研究中应用的以基因组为中心的方法可以提供有关未知物种和菌株以及样本中单个社区成员的功能性贡献的有价值的信息。此外,我们提出了一种生成特定样品的高度复杂的计算机黄金标准的方法。审阅者本文由Craig Herbold,Serghei Mangul和Yana Bromberg审阅。检索高质量基因组草图至关重要的是序列深度,但更重要的是,很大比例的细菌序列部分也能实现细菌基因组的高覆盖率。与依赖数据库知识的基于读取的方法相反,本研究中应用的以基因组为中心的方法可以提供有关未知物种和菌株以及样本中单个社区成员的功能性贡献的有价值的信息。此外,我们提出了一种用于生成特定样品的高度复杂的计算机黄金标准的方法。审阅者本文由Craig Herbold,Serghei Mangul和Yana Bromberg审阅。在这项研究中应用的以基因组为中心的方法可以提供有关未知物种和菌株以及样本中单个社区成员的功能性贡献的有价值的信息。此外,我们提出了一种生成特定样品的高度复杂的计算机黄金标准的方法。审阅者本文由Craig Herbold,Serghei Mangul和Yana Bromberg审阅。在这项研究中应用的以基因组为中心的方法可以提供有关未知物种和菌株以及样本中单个社区成员的功能性贡献的有价值的信息。此外,我们提出了一种生成特定样品的高度复杂的计算机黄金标准的方法。审阅者本文由Craig Herbold,Serghei Mangul和Yana Bromberg审阅。
更新日期:2020-04-22
down
wechat
bug