当前位置: X-MOL 学术Nat. Biotechnol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CheckV assesses the quality and completeness of metagenome-assembled viral genomes
Nature Biotechnology ( IF 33.1 ) Pub Date : 2020-12-21 , DOI: 10.1038/s41587-020-00774-7
Stephen Nayfach 1 , Antonio Pedro Camargo 2 , Frederik Schulz 1 , Emiley Eloe-Fadrosh 1 , Simon Roux 1 , Nikos C Kyrpides 1
Affiliation  

Millions of new viral sequences have been identified from metagenomes, but the quality and completeness of these sequences vary considerably. Here we present CheckV, an automated pipeline for identifying closed viral genomes, estimating the completeness of genome fragments and removing flanking host regions from integrated proviruses. CheckV estimates completeness by comparing sequences with a large database of complete viral genomes, including 76,262 identified from a systematic search of publicly available metagenomes, metatranscriptomes and metaviromes. After validation on mock datasets and comparison to existing methods, we applied CheckV to large and diverse collections of metagenome-assembled viral sequences, including IMG/VR and the Global Ocean Virome. This revealed 44,652 high-quality viral genomes (that is, >90% complete), although the vast majority of sequences were small fragments, which highlights the challenge of assembling viral genomes from short-read metagenomes. Additionally, we found that removal of host contamination substantially improved the accurate identification of auxiliary metabolic genes and interpretation of viral-encoded functions.



中文翻译:

CheckV 评估宏基因组组装的病毒基因组的质量和完整性

已经从宏基因组中鉴定出数以百万计的新病毒序列,但这些序列的质量和完整性差异很大。在这里,我们介绍了 CheckV,这是一种用于识别封闭病毒基因组、估计基因组片段的完整性和从整合的前病毒中去除侧翼宿主区域的自动化管道。CheckV 通过将序列与完整病毒基因组的大型数据库进行比较来估计完整性,其中包括通过系统搜索公开可用的宏基因组、元转录组和元病毒组而确定的 76,262 个。在对模拟数据集进行验证并与现有方法进行比较后,我们将 CheckV 应用于宏基因组组装的病毒序列的大量不同集合,包括 IMG/VR 和全球海洋病毒组。这揭示了 44,652 个高质量的病毒基因组(即 > 90% 完整),尽管绝大多数序列都是小片段,这凸显了从短读长宏基因组组装病毒基因组的挑战。此外,我们发现去除宿主污染大大提高了辅助代谢基因的准确识别和病毒编码功能的解释。

更新日期:2020-12-21
down
wechat
bug