当前位置: X-MOL 学术Semant. Web › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A systematic analysis of term reuse and term overlap across biomedical ontologies
Semantic Web ( IF 3 ) Pub Date : 2017-08-07 , DOI: 10.3233/sw-160238
Maulik R Kamdar 1 , Tania Tudorache 1 , Mark A Musen 1
Affiliation  

Reusing ontologies and their terms is a principle and best practice that most ontology development methodologies strongly encourage. Reuse comes with the promise to support the semantic interoperability and to reduce engineering costs. In this paper, we present a descriptive study of the current extent of term reuse and overlap among biomedical ontologies. We use the corpus of biomedical ontologies stored in the BioPortal repository, and analyze different types of reuse and overlap constructs. While we find an approximate term overlap between 25-31%, the term reuse is only <9%, with most ontologies reusing fewer than 5% of their terms from a small set of popular ontologies. Clustering analysis shows that the terms reused by a common set of ontologies have >90% semantic similarity, hinting that ontology developers tend to reuse terms that are sibling or parent-child nodes. We validate this finding by analysing the logs generated from a Protégé plugin that enables developers to reuse terms from BioPortal. We find most reuse constructs were 2-level subtrees on the higher levels of the class hierarchy. We developed a Web application that visualizes reuse dependencies and overlap among ontologies, and that proposes similar terms from BioPortal for a term of interest. We also identified a set of error patterns that indicate that ontology developers did intend to reuse terms from other ontologies, but that they were using different and sometimes incorrect representations. Our results stipulate the need for semi-automated tools that augment term reuse in the ontology engineering process through personalized recommendations.

中文翻译:

跨生物医学本体的术语重用和术语重叠的系统分析

重用本体及其术语是大多数本体开发方法强烈鼓励的原则和最佳实践。重用带有支持语义互操作性并降低工程成本的承诺。在本文中,我们对当前术语重用和生物医学本体之间的重叠程度进行了描述性研究。我们使用存储在BioPortal存储库中的生物医学本体集,并分析不同类型的重用和重叠结构。虽然我们发现大约有25-31%的术语重叠,但术语重用率仅为<9%,大多数本体在少数流行本体中仅重用了不到5%的术语。聚类分析表明,一组常见的本体重用的术语具有> 90%的语义相似度,暗示本体开发人员倾向于重用同级或父子节点的术语。我们通过分析Protégé插件生成的日志来验证此发现,该插件使开发人员可以重用BioPortal中的术语。我们发现,大多数重用构造是类层次结构较高级别上的2级子树。我们开发了一个Web应用程序,该应用程序可视化了重用依赖性和本体之间的重叠,并为感兴趣的术语提出了BioPortal的类似术语。我们还确定了一组错误模式,这些模式表明本体开发人员确实打算重用其他本体的术语,但是他们使用的是不同的,有时是错误的表示形式。我们的结果表明需要使用半自动化工具,以通过个性化建议来增强本体工程过程中的术语重用性。
更新日期:2017-08-07
down
wechat
bug