当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
bitacora: A comprehensive tool for the identification and annotation of gene families in genome assemblies.
Molecular Ecology Resources ( IF 5.5 ) Pub Date : 2020-06-03 , DOI: 10.1111/1755-0998.13202
Joel Vizueta 1 , Alejandro Sánchez-Gracia 1 , Julio Rozas 1
Affiliation  

Gene annotation is a critical bottleneck in genomic research, especially for the comprehensive study of very large gene families in the genomes of nonmodel organisms. Despite the recent progress in automatic methods, state‐of‐the‐art tools used for this task often produce inaccurate annotations, such as fused, chimeric, partial or even completely absent gene models for many family copies, errors that require considerable extra efforts to be corrected. Here we present bitacora, a bioinformatics solution that integrates popular sequence similarity‐based search tools and Perl scripts to facilitate both the curation of these inaccurate annotations and the identification of previously undetected gene family copies directly in genomic DNA sequences. We tested the performance of bitacora in annotating the members of two chemosensory gene families with different repertoire size in seven available genome sequences, and compared its performance with that of augustus‐ppx, a tool also designed to improve automatic annotations using a sequence similarity‐based approach. Despite the relatively high fragmentation of some of these drafts, bitacora was able to improve the annotation of many members of these families and detected thousands of new chemoreceptors encoded in genome sequences. The program creates general feature format (GFF) files, with both curated and newly identified gene models, and FASTA files with the predicted proteins. These outputs can be easily integrated in genomic annotation editors, greatly facilitating subsequent manual annotation and downstream evolutionary analyses.

中文翻译:

bitacora:用于基因组组装中基因家族识别和注释的综合工具。

基因注释是基因组研究的一个关键瓶颈,特别是对于非模式生物基因组中非常大的基因家族的综合研究。尽管最近在自动方法方面取得了进展,但用于此任务的最先进的工具经常会产生不准确的注释,例如许多家族副本的融合、嵌合、部分甚至完全缺失的基因模型,这些错误需要付出相当大的额外努力得到纠正。在这里,我们介绍了bitacora,这是一种生物信息学解决方案,它集成了流行的基于序列相似性的搜索工具和 Perl 脚本,以促进这些不准确注释的管理以及直接在基因组 DNA 序列中识别以前未检测到的基因家族拷贝。我们测试了bitacora的性能在对七个可用基因组序列中具有不同库大小的两个化学感应基因家族的成员进行注释时,并将其性能与augustus-ppx 的性能进行比较,该工具也旨在使用基于序列相似性的方法改进自动注释。尽管其中一些草案相对分散,bitacora能够改进对这些家族许多成员的注释,并检测到数千个编码在基因组序列中的新化学感受器。该程序创建通用特征格式 (GFF) 文件,其中包含策划和新识别的基因模型,以及包含预测蛋白质的 FASTA 文件。这些输出可以很容易地集成到基因组注释编辑器中,极大地促进了后续的手动注释和下游进化分析。
更新日期:2020-06-03
down
wechat
bug