当前位置: X-MOL 学术PLOS Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Stability of SARS-CoV-2 phylogenies
PLOS Genetics ( IF 4.0 ) Pub Date : 2020-11-18 , DOI: 10.1371/journal.pgen.1009175
Yatish Turakhia 1, 2 , Nicola De Maio 3 , Bryan Thornlow 1, 2 , Landen Gozashti 1, 2, 4 , Robert Lanfear 5 , Conor R Walker 3, 6 , Angie S Hinrichs 2 , Jason D Fernandes 1, 2, 7 , Rui Borges 8 , Greg Slodkowicz 9 , Lukas Weilguny 3 , David Haussler 1, 2, 7 , Nick Goldman 3 , Russell Corbett-Detig 1, 2
Affiliation  

The SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques. There is thus a strong possibility that systematic errors associated with lab—or protocol—specific practices affect some sequences in the repositories. We find that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein-coding sequences than other similarly recurrent mutations. We show that their inclusion can affect phylogenetic inference on scales relevant to local lineage tracing, and make it appear as though there has been an excess of recurrent mutation or recombination among viral lineages. We suggest how samples can be screened and problematic variants removed, and we plan to regularly inform the scientific community with our updated results as more SARS-CoV-2 genome sequences are shared (https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473 and https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480). We also develop tools for comparing and visualizing differences among very large phylogenies and we show that consistent clade- and tree-based comparisons can be made between phylogenies produced by different groups. These will facilitate evolutionary inferences and comparisons among phylogenies produced for a wide array of purposes. Building on the SARS-CoV-2 Genome Browser at UCSC, we present a toolkit to compare, analyze and combine SARS-CoV-2 phylogenies, find and remove potential sequencing errors and establish a widely shared, stable clade structure for a more accurate scientific inference and discourse.



中文翻译:

SARS-CoV-2 系统发育的稳定性

由于社区测序反应迅速,SARS-CoV-2 大流行导致了前所未有的、近乎实时的基因追踪。研究人员立即利用这些数据推断病毒样本之间的进化关系,并研究关键的生物学问题,包括宿主病毒基因组编辑和重组是否是 SARS-CoV-2 进化的特征。这种全球测序工作本质上是分散的,必须依赖于许多实验室使用各种分子和生物信息学技术收集的数据。因此,与实验室或协议特定实践相关的系统错误很可能会影响存储库中的某些序列。我们发现,报告的 SARS-CoV-2 基因组序列中的一些复发突变主要或仅由单个实验室观察到,与常用引物结合位点共定位,并且比其他类似的复发突变更有可能影响蛋白质编码序列。我们表明,它们的包含可以影响与局部谱系追踪相关的系统发育推断,并使病毒谱系之间出现过多的反复突变或重组。我们建议如何筛选样本并消除有问题的变异,并且随着更多 SARS-CoV-2 基因组序列的共享,我们计划定期向科学界通报我们的最新结果 (https://virological.org/t/issues-with -sars-cov-2-sequencing-data/473 和 https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480)。我们还开发了用于比较和可视化非常大的系统发育之间差异的工具,并且我们表明可以在不同群体产生的系统发育之间进行一致的基于分支和基于树的比较。这些将有助于进化推断和为各种目的而产生的系统发育之间的比较。在 UCSC 的 SARS-CoV-2 基因组浏览器的基础上,我们提出了一个工具包,用于比较、分析和组合 SARS-CoV-2 系统发育,发现并消除潜在的测序错误,并建立广泛共享、稳定的进化枝结构,以实现更准确的科学预测。推理和论述。

更新日期:2020-11-19
down
wechat
bug