当前位置:
X-MOL 学术
›
Mol. Ecol. Resour.
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
TOA: A software package for automated functional annotation in non‐model plant species
Molecular Ecology Resources ( IF 7.7 ) Pub Date : 2020-10-18 , DOI: 10.1111/1755-0998.13285 Fernando Mora-Márquez 1 , Víctor Chano 1 , José Luis Vázquez-Poletti 2 , Unai López de Heredia 1
Molecular Ecology Resources ( IF 7.7 ) Pub Date : 2020-10-18 , DOI: 10.1111/1755-0998.13285 Fernando Mora-Márquez 1 , Víctor Chano 1 , José Luis Vázquez-Poletti 2 , Unai López de Heredia 1
Affiliation
The increase of sequencing capacity provided by high‐throughput platforms has made it possible to routinely obtain large sets of genomic and transcriptomic sequences from model and non‐model organisms. Subsequent genomic analysis and gene discovery in next‐generation sequencing experiments are, however, bottlenecked by functional annotation. One common way to perform functional annotation of sets of sequences obtained from next‐generation sequencing experiments, is by searching for homologous sequences and accessing the related functional information deposited in genomic databases. Functional annotation is especially challenging for non‐model organisms, like many plant species. In such cases, existing free and commercial general‐purpose applications may not offer complete and accurate results. We present TOA (Taxonomy‐oriented annotation), a Python‐based user‐friendly open source application designed to establish functional annotation pipelines geared towards non‐model plant species that can run in Linux/Mac computers, HPCs and cloud servers. TOA performs homology searches against proteins stored in the PLAZA databases, NCBI RefSeq Plant, Nucleotide Database and Non‐Redundant Protein Sequence Database, and outputs functional information from several ontology systems: Gene Ontology, InterPro, EC, KEGG, Mapman and MetaCyc. The software performance was validated by comparing the runtimes, total number of annotated sequences and accuracy of the functional information obtained for several plant benchmark data sets with TOA and other functional annotation solutions. TOA outperformed the other software in terms of number of annotated sequences and accuracy of the annotation and constitutes a good alternative to improve functional annotation in plants. TOA is especially recommended for gymnosperms or for low quality sequence data sets of non‐model plants.
中文翻译:
TOA:用于非模式植物物种自动功能注释的软件包
高通量平台提供的测序能力的增加使得从模型和非模型生物中常规获得大量基因组和转录组序列成为可能。然而,下一代测序实验中的后续基因组分析和基因发现受到功能注释的阻碍。对从下一代测序实验中获得的序列集进行功能注释的一种常用方法是搜索同源序列并访问存储在基因组数据库中的相关功能信息。对于非模式生物,如许多植物物种,功能注释尤其具有挑战性。在这种情况下,现有的免费和商业通用应用程序可能无法提供完整和准确的结果。我们提出 TOA(面向分类的注释),一个基于 Python 的用户友好型开源应用程序,旨在建立面向非模型植物物种的功能注释管道,可在 Linux/Mac 计算机、HPC 和云服务器中运行。TOA 对存储在 PLAZA 数据库、NCBI RefSeq Plant、核苷酸数据库和非冗余蛋白质序列数据库中的蛋白质执行同源性搜索,并从多个本体系统输出功能信息:Gene Ontology、InterPro、EC、KEGG、Mapman 和 MetaCyc。通过将运行时间、注释序列总数和为多个工厂基准数据集获得的功能信息与 TOA 和其他功能注释解决方案进行比较,验证了软件性能。TOA 在注释序列的数量和注释的准确性方面优于其他软件,是改进植物功能注释的一个很好的选择。TOA 特别推荐用于裸子植物或非模式植物的低质量序列数据集。
更新日期:2020-10-18
中文翻译:
TOA:用于非模式植物物种自动功能注释的软件包
高通量平台提供的测序能力的增加使得从模型和非模型生物中常规获得大量基因组和转录组序列成为可能。然而,下一代测序实验中的后续基因组分析和基因发现受到功能注释的阻碍。对从下一代测序实验中获得的序列集进行功能注释的一种常用方法是搜索同源序列并访问存储在基因组数据库中的相关功能信息。对于非模式生物,如许多植物物种,功能注释尤其具有挑战性。在这种情况下,现有的免费和商业通用应用程序可能无法提供完整和准确的结果。我们提出 TOA(面向分类的注释),一个基于 Python 的用户友好型开源应用程序,旨在建立面向非模型植物物种的功能注释管道,可在 Linux/Mac 计算机、HPC 和云服务器中运行。TOA 对存储在 PLAZA 数据库、NCBI RefSeq Plant、核苷酸数据库和非冗余蛋白质序列数据库中的蛋白质执行同源性搜索,并从多个本体系统输出功能信息:Gene Ontology、InterPro、EC、KEGG、Mapman 和 MetaCyc。通过将运行时间、注释序列总数和为多个工厂基准数据集获得的功能信息与 TOA 和其他功能注释解决方案进行比较,验证了软件性能。TOA 在注释序列的数量和注释的准确性方面优于其他软件,是改进植物功能注释的一个很好的选择。TOA 特别推荐用于裸子植物或非模式植物的低质量序列数据集。