当前位置: X-MOL 学术Database J. Biol. Databases Curation › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automated generation of gene summaries at the Alliance of Genome Resources.
Database: The Journal of Biological Databases and Curation ( IF 5.8 ) Pub Date : 2020-06-19 , DOI: 10.1093/database/baaa037
Ranjana Kishore 1 , Valerio Arnaboldi 1 , Ceri E Van Slyke 2 , Juancarlos Chan 1 , Robert S Nash 3 , Jose M Urbano 4 , Mary E Dolan 5 , Stacia R Engel 3 , Mary Shimoyama 6 , Paul W Sternberg 1 , The Alliance Of Genome Resources
Affiliation  

Short paragraphs that describe gene function, referred to as gene summaries, are valued by users of biological knowledgebases for the ease with which they convey key aspects of gene function. Manual curation of gene summaries, while desirable, is difficult for knowledgebases to sustain. We developed an algorithm that uses curated, structured gene data at the Alliance of Genome Resources (Alliance; www.alliancegenome.org) to automatically generate gene summaries that simulate natural language. The gene data used for this purpose include curated associations (annotations) to ontology terms from the Gene Ontology, Disease Ontology, model organism knowledgebase (MOK)-specific anatomy ontologies and Alliance orthology data. The method uses sentence templates for each data category included in the gene summary in order to build a natural language sentence from the list of terms associated with each gene. To improve readability of the summaries when numerous gene annotations are present, we developed a new algorithm that traverses ontology graphs in order to group terms by their common ancestors. The algorithm optimizes the coverage of the initial set of terms and limits the length of the final summary, using measures of information content of each ontology term as a criterion for inclusion in the summary. The automated gene summaries are generated with each Alliance release, ensuring that they reflect current data at the Alliance. Our method effectively leverages category-specific curation efforts of the Alliance member databases to create modular, structured and standardized gene summaries for seven member species of the Alliance. These automatically generated gene summaries make cross-species gene function comparisons tenable and increase discoverability of potential models of human disease. In addition to being displayed on Alliance gene pages, these summaries are also included on several MOK gene pages.

中文翻译:

在基因组资源联盟自动生成基因摘要。

描述基因功能的简短段落(称为基因摘要)受到生物学知识库用户的重视,因为它们易于传达基因功能的关键方面。基因摘要的手动管理虽然可取,但知识库难以维持。我们开发了一种算法,该算法使用基因组资源联盟 (Alliance; www.alliancegenome.org) 中精选的结构化基因数据来自动生成模拟自然语言的基因摘要。用于此目的的基因数据包括与来自基因本体、疾病本体、模型生物知识库 (MOK) 特定解剖本体和联盟正统数据的本体术语的精选关联(注释)。该方法使用基因摘要中包含的每个数据类别的句子模板,以便从与每个基因相关的术语列表中构建自然语言句子。当存在大量基因注释时,为了提高摘要的可读性,我们开发了一种遍历本体图的新算法,以便按其共同祖先对术语进行分组。该算法优化了初始术语集的覆盖范围并限制了最终摘要的长度,使用每个本体术语的信息内容度量作为包含在摘要中的标准。每个联盟版本都会生成自动基因摘要,以确保它们反映联盟的当前数据。我们的方法有效地利用联盟成员数据库的特定类别管理工作来创建模块化、联盟七个成员物种的结构化和标准化基因摘要。这些自动生成的基因摘要使跨物种基因功能比较变得可行,并增加了人类疾病潜在模型的可发现性。除了在 Alliance 基因页面上显示之外,这些摘要还包含在几个 MOK 基因页面上。
更新日期:2020-06-19
down
wechat
bug