当前位置: X-MOL 学术Gigascience › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms.
GigaScience ( IF 9.2 ) Pub Date : 2020-02-01 , DOI: 10.1093/gigascience/giaa008
Patrick Denis Browne 1, 2 , Tue Kjærgaard Nielsen 1, 2 , Witold Kot 1, 2 , Anni Aggerholm 3 , M Thomas P Gilbert 4 , Lara Puetz 4 , Morten Rasmussen 5 , Athanasios Zervas 2 , Lars Hestbjerg Hansen 1, 2
Affiliation  

BACKGROUND Metagenomic sequencing is a well-established tool in the modern biosciences. While it promises unparalleled insights into the genetic content of the biological samples studied, conclusions drawn are at risk from biases inherent to the DNA sequencing methods, including inaccurate abundance estimates as a function of genomic guanine-cytosine (GC) contents. RESULTS We explored such GC biases across many commonly used platforms in experiments sequencing multiple genomes (with mean GC contents ranging from 28.9% to 62.4%) and metagenomes. GC bias profiles varied among different library preparation protocols and sequencing platforms. We found that our workflows using MiSeq and NextSeq were hindered by major GC biases, with problems becoming increasingly severe outside the 45-65% GC range, leading to a falsely low coverage in GC-rich and especially GC-poor sequences, where genomic windows with 30% GC content had >10-fold less coverage than windows close to 50% GC content. We also showed that GC content correlates tightly with coverage biases. The PacBio and HiSeq platforms also evidenced similar profiles of GC biases to each other, which were distinct from those seen in the MiSeq and NextSeq workflows. The Oxford Nanopore workflow was not afflicted by GC bias. CONCLUSIONS These findings indicate potential sources of difficulty, arising from GC biases, in genome sequencing that could be pre-emptively addressed with methodological optimizations provided that the GC biases inherent to the relevant workflow are understood. Furthermore, it is recommended that a more critical approach be taken in quantitative abundance estimates in metagenomic studies. In the future, metagenomic studies should take steps to account for the effects of GC bias before drawing conclusions, or they should use a demonstrably unbiased workflow.

中文翻译:

GC偏倚影响基因组和宏基因组的重建,不足代表了GC贫乏的生物。

背景技术元基因组测序是现代生物科学中公认的工具。尽管它有望对研究的生物样品的遗传含量提供无与伦比的洞察力,但得出的结论有可能受到DNA测序方法固有的偏见的威胁,包括不准确的丰度估计值与基因组鸟嘌呤-胞嘧啶(GC)含量有关。结果我们在对多个基因组(平均GC含量范围从28.9%至62.4%)和元基因组进行测序的实验中,跨许多常用平台探索了此类GC偏倚。GC偏倚曲线在不同的文库制备方案和测序平台之间有所不同。我们发现,使用MiSeq和NextSeq的工作流程受到主要的GC偏差的阻碍,在45-65%GC范围之外,问题变得越来越严重,导致富含GC且尤其是GC较差的序列的覆盖率错误降低,其中GC含量为30%的基因组窗口的覆盖率比GC含量接近50%的窗口低10倍以上。我们还表明,GC含量与覆盖偏差紧密相关。PacBio和HiSeq平台还证明了彼此之间相似的GC偏差曲线,这与MiSeq和NextSeq工作流程中看到的曲线截然不同。牛津纳米孔的工作流程不受GC偏见的影响。结论这些发现表明,由GC偏差引起的潜在困难源可能在基因组测序中得以解决,只要了解相关工作流程固有的GC偏差,就可以通过方法学优化来解决这些潜在困难。此外,建议在宏基因组学研究中对定量丰度进行评估时,应采用更为关键的方法。将来,宏基因组学研究应在得出结论之前采取步骤来考虑GC偏倚的影响,或者应使用可证明无偏见的工作流程。
更新日期:2020-02-13
down
wechat
bug