当前位置: X-MOL 学术Open Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns.
Open Biology ( IF 5.8 ) Pub Date : 2020-09-02 , DOI: 10.1098/rsob.200149
Valerie Wood 1, 2 , Seth Carbon 3 , Midori A Harris 1, 2 , Antonia Lock 4 , Stacia R Engel 5 , David P Hill 6 , Kimberly Van Auken 7 , Helen Attrill 8 , Marc Feuermann 9 , Pascale Gaudet 9 , Ruth C Lovering 10 , Sylvain Poux 9 , Kim M Rutherford 1, 2 , Christopher J Mungall 3
Affiliation  

Biological processes are accomplished by the coordinated action of gene products. Gene products often participate in multiple processes, and can therefore be annotated to multiple Gene Ontology (GO) terms. Nevertheless, processes that are functionally, temporally and/or spatially distant may have few gene products in common, and co-annotation to unrelated processes probably reflects errors in literature curation, ontology structure or automated annotation pipelines. We have developed an annotation quality control workflow that uses rules based on mutually exclusive processes to detect annotation errors, based on and validated by case studies including the three we present here: fission yeast protein-coding gene annotations over time; annotations for cohesin complex subunits in human and model species; and annotations using a selected set of GO biological process terms in human and five model species. For each case study, we reviewed available GO annotations, identified pairs of biological processes which are unlikely to be correctly co-annotated to the same gene products (e.g. amino acid metabolism and cytokinesis), and traced erroneous annotations to their sources. To date we have generated 107 quality control rules, and corrected 289 manual annotations in eukaryotes and over 52 700 automatically propagated annotations across all taxa.



中文翻译:

术语矩阵:一种基于本体术语共注释模式的新型基因本体注释质量控制系统。

生物过程是通过基因产物的协同作用完成的。基因产品通常参与多个过程,因此可以注释为多个基因本体 (GO) 术语。然而,在功能上、时间上和/或空间上相距遥远的过程可能很少有共同的基因产物,并且对不相关过程的共同注释可能反映了文献管理、本体结构或自动注释管道中的错误。我们开发了一个注释质量控制工作流程,它使用基于互斥过程的规则来检测注释错误,基于案例研究并通过案例研究进行验证,包括我们在此介绍的三个:裂变酵母蛋白质编码基因随时间的注释;人类和模型物种中 cohesin 复合亚基的注释;和使用一组选定的人类和五个模型物种的 GO 生物过程术语进行注释。对于每个案例研究,我们审查了可用的 GO 注释,确定了不太可能正确共同注释到相同基因产物(例如氨基酸代谢和胞质分裂)的生物过程对,并追踪错误注释的来源。迄今为止,我们已经生成了 107 条质量控制规则,并纠正了真核生物中的 289 条手动注释以及所有分类群中超过 52 700 条自动传播的注释。并将错误的注释追溯到其来源。迄今为止,我们已经生成了 107 条质量控制规则,并纠正了真核生物中的 289 条手动注释以及所有分类群中超过 52 700 条自动传播的注释。并将错误的注释追溯到其来源。迄今为止,我们已经生成了 107 条质量控制规则,并纠正了真核生物中的 289 条手动注释以及所有分类群中超过 52 700 条自动传播的注释。

更新日期:2020-09-02
down
wechat
bug