当前位置: X-MOL 学术Nat. Biotechnol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Quality assessment of gene repertoire annotations with OMArk
Nature Biotechnology ( IF 46.9 ) Pub Date : 2024-02-21 , DOI: 10.1038/s41587-024-02147-w
Yannis Nevers , Alex Warwick Vesztrocy , Victor Rossier , Clément-Marie Train , Adrian Altenhoff , Christophe Dessimoz , Natasha M. Glover

In the era of biodiversity genomics, it is crucial to ensure that annotations of protein-coding gene repertoires are accurate. State-of-the-art tools to assess genome annotations measure the completeness of a gene repertoire but are blind to other errors, such as gene overprediction or contamination. We introduce OMArk, a software package that relies on fast, alignment-free sequence comparisons between a query proteome and precomputed gene families across the tree of life. OMArk assesses not only the completeness but also the consistency of the gene repertoire as a whole relative to closely related species and reports likely contamination events. Analysis of 1,805 UniProt Eukaryotic Reference Proteomes with OMArk demonstrated strong evidence of contamination in 73 proteomes and identified error propagation in avian gene annotation resulting from the use of a fragmented zebra finch proteome as a reference. This study illustrates the importance of comparing and prioritizing proteomes based on their quality measures.



中文翻译:

使用 OMArk 进行基因库注释的质量评估

在生物多样性基因组学时代,确保蛋白质编码基因库注释的准确性至关重要。用于评估基因组注释的最先进的工具可以测量基因库的完整性,但对其他错误视而不见,例如基因过度预测或污染。我们推出了 OMArk,这是一个软件包,它依赖于生命树中查询蛋白质组和预先计算的基因家族之间的快速、免比对序列比较。OMArk 不仅评估基因库的完整性,还评估整个基因库相对于密切相关物种的一致性,并报告可能的污染事件。使用 OMArk 对 1,805 个 UniProt 真核参考蛋白质组进行的分析证明了 73 个蛋白质组中存在污染的有力证据,并发现了由于使用斑胸草雀蛋白质组片段作为参考而导致的鸟类基因注释中的错误传播。这项研究说明了根据质量指标对蛋白质组进行比较和优先排序的重要性。

更新日期:2024-02-21
down
wechat
bug