当前位置: X-MOL 学术mSphere › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
To Dereplicate or Not To Dereplicate?
mSphere ( IF 3.7 ) Pub Date : 2020-05-20 , DOI: 10.1128/msphere.00971-19
Jacob T Evans 1 , Vincent J Denef 2
Affiliation  

Metagenome-assembled genomes (MAGs) expand our understanding of microbial diversity, evolution, and ecology. Concerns have been raised on how sequencing, assembly, binning, and quality assessment tools may result in MAGs that do not reflect single populations in nature. Here, we reflect on another issue, i.e., how to handle highly similar MAGs assembled from independent data sets. Obtaining multiple genomic representatives for a species is highly valuable, as it allows for population genomic analyses; however, when retaining genomes of closely related populations, it complicates MAG quality assessment and abundance inferences. We show that (i) published data sets contain a large fraction of MAGs sharing >99% average nucleotide identity, (ii) different software packages and parameters used to resolve this redundancy remove very different numbers of MAGs, and (iii) the removal of closely related genomes leads to losses of population-specific auxiliary genes. Finally, we highlight some approaches that can infer strain-specific dynamics across a sample series without dereplication.

中文翻译:

去复制还是不去复制?

由基因组组装的基因组(MAG)扩展了我们对微生物多样性,进化和生态学的理解。人们担心,测序,组装,装箱和质量评估工具如何导致MAG无法反映自然界中的单个种群。在这里,我们思考另一个问题,即如何处理由独立数据集组装而成的高度相似的MAG。获得一个物种的多个基因组代表具有很高的价值,因为它可以进行种群基因组分析。但是,当保留密切相关种群的基因组时,它会使MAG质量评估和丰富度推论复杂化。我们表明(i)已发布的数据集包含很大比例的MAG,它们共享的平均核苷酸同一性超过99%,(ii)用于解决此冗余的不同软件包和参数会删除数量非常不同的MAG,并且(iii)删除密切相关的基因组会导致种群特异性辅助基因的丢失。最后,我们重点介绍了一些可以在不重复的情况下推断整个样本系列中特定于应变的动力学的方法。
更新日期:2020-05-20
down
wechat
bug