当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
VADR: validation and annotation of virus sequence submissions to GenBank.
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2020-05-24 , DOI: 10.1186/s12859-020-3537-3
Alejandro A Schäffer 1, 2 , Eneida L Hatcher 2 , Linda Yankie 2 , Lara Shonkwiler 2, 3 , J Rodney Brister 2 , Ilene Karsch-Mizrachi 2 , Eric P Nawrocki 2
Affiliation  

BACKGROUND GenBank contains over 3 million viral sequences. The National Center for Biotechnology Information (NCBI) previously made available a tool for validating and annotating influenza virus sequences that is used to check submissions to GenBank. Before this project, there was no analogous tool in use for non-influenza viral sequence submissions. RESULTS We developed a system called VADR (Viral Annotation DefineR) that validates and annotates viral sequences in GenBank submissions. The annotation system is based on the analysis of the input nucleotide sequence using models built from curated RefSeqs. Hidden Markov models are used to classify sequences by determining the RefSeq they are most similar to, and feature annotation from the RefSeq is mapped based on a nucleotide alignment of the full sequence to a covariance model. Predicted proteins encoded by the sequence are validated with nucleotide-to-protein alignments using BLAST. The system identifies 43 types of "alerts" that (unlike the previous BLAST-based system) provide deterministic and rigorous feedback to researchers who submit sequences with unexpected characteristics. VADR has been integrated into GenBank's submission processing pipeline allowing for viral submissions passing all tests to be accepted and annotated automatically, without the need for any human (GenBank indexer) intervention. Unlike the previous submission-checking system, VADR is freely available (https://github.com/nawrockie/vadr) for local installation and use. VADR has been used for Norovirus submissions since May 2018 and for Dengue virus submissions since January 2019. Since March 2020, VADR has also been used to check SARS-CoV-2 sequence submissions. Other viruses with high numbers of submissions will be added incrementally. CONCLUSION VADR improves the speed with which non-flu virus submissions to GenBank can be checked and improves the content and quality of the GenBank annotations. The availability and portability of the software allow researchers to run the GenBank checks prior to submitting their viral sequences, and thereby gain confidence that their submissions will be accepted immediately without the need to correspond with GenBank staff. Reciprocally, the adoption of VADR frees GenBank staff to spend more time on services other than checking routine viral sequence submissions.

中文翻译:

VADR:验证和注释病毒序列提交给GenBank。

背景技术GenBank包含超过三百万个病毒序列。美国国家生物技术信息中心(NCBI)以前提供了一种用于验证和注释流感病毒序列的工具,该工具用于检查向GenBank提交的信息。在该项目之前,没有用于非流感病毒序列提交的类似工具。结果我们开发了一个名为VADR(病毒注释DefineR)的系统,该系统可以验证和注释GenBank提交中的病毒序列。注释系统是基于使用策划的RefSeqs构建的模型对输入核苷酸序列进行的分析。隐藏式马尔可夫模型用于通过确定序列最相似的RefSeq对序列进行分类,并且基于完整序列与协方差模型的核苷酸比对,将RefSeq的特征注释进行映射。使用BLAST通过核苷酸与蛋白质的比对验证由序列编码的预测蛋白质。该系统可以识别43种“警报”类型(与以前的基于BLAST的系统不同),可以为提交具有意外特征的序列的研究人员提供确定性和严格的反馈。VADR已集成到GenBank的提交处理管道中,允许通过所有测试的病毒提交被自动接受和注释,而无需任何人工干预(GenBank索引器)。与以前的提交检查系统不同,VADR是免费的(https://github.com/nawrockie/vadr)供本地安装和使用。自2018年5月以来,VADR已用于诺如病毒提交,自2019年1月以来已用于登革热病毒提交。自2020年3月起,VADR也已用于检查SARS-CoV-2序列提交。其他具有大量提交内容的病毒将被增量添加。结论VADR提高了检查非流感病毒向GenBank提交的速度,并提高了GenBank注释的内容和质量。该软件的可用性和可移植性使研究人员可以在提交病毒序列之前运行GenBank检查,从而有信心在无需与GenBank人员联系的情况下立即接受他们的提交。相反,采用VADR可以使GenBank员工腾出更多时间用于服务,而无需检查常规病毒序列提交。结论VADR可以提高检查非流感病毒向GenBank提交的速度,并提高GenBank注释的内容和质量。该软件的可用性和可移植性使研究人员可以在提交病毒序列之前运行GenBank检查,从而有信心在无需与GenBank人员联系的情况下立即接受他们的提交。相反,采用VADR可以使GenBank员工腾出更多时间用于服务,而无需检查常规病毒序列提交。结论VADR可以提高检查非流感病毒向GenBank提交的速度,并提高GenBank注释的内容和质量。该软件的可用性和可移植性使研究人员可以在提交病毒序列之前运行GenBank检查,从而有信心在无需与GenBank人员联系的情况下立即接受他们的提交。相反,采用VADR可以使GenBank员工腾出更多时间用于服务,而无需检查常规病毒序列提交。从而有信心在无需与GenBank工作人员通信的情况下立即接受他们的提交。相反,采用VADR可以使GenBank员工腾出更多时间在服务上,而不是检查常规病毒序列提交。从而有信心在无需与GenBank工作人员通信的情况下立即接受他们的提交。相反,采用VADR可以使GenBank员工腾出更多时间用于服务,而无需检查常规病毒序列提交。
更新日期:2020-05-24
down
wechat
bug