当前位置: X-MOL 学术PeerJ Comput. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Application and evaluation of knowledge graph embeddings in biomedical data
PeerJ Computer Science ( IF 3.5 ) Pub Date : 2021-02-18 , DOI: 10.7717/peerj-cs.341
Mona Alshahrani 1 , Maha A Thafar 2, 3 , Magbubah Essack 2
Affiliation  

Linked data and bio-ontologies enabling knowledge representation, standardization, and dissemination are an integral part of developing biological and biomedical databases. That is, linked data and bio-ontologies are employed in databases to maintain data integrity, data organization, and to empower search capabilities. However, linked data and bio-ontologies are more recently being used to represent information as multi-relational heterogeneous graphs, “knowledge graphs”. The reason being, entities and relations in the knowledge graph can be represented as embedding vectors in semantic space, and these embedding vectors have been used to predict relationships between entities. Such knowledge graph embedding methods provide a practical approach to data analytics and increase chances of building machine learning models with high prediction accuracy that can enhance decision support systems. Here, we present a comparative assessment and a standard benchmark for knowledge graph-based representation learning methods focused on the link prediction task for biological relations. We systematically investigated and compared state-of-the-art embedding methods based on the design settings used for training and evaluation. We further tested various strategies aimed at controlling the amount of information related to each relation in the knowledge graph and its effects on the final performance. We also assessed the quality of the knowledge graph features through clustering and visualization and employed several evaluation metrics to examine their uses and differences. Based on this systematic comparison and assessments, we identify and discuss the limitations of knowledge graph-based representation learning methods and suggest some guidelines for the development of more improved methods.

中文翻译:


知识图嵌入在生物医学数据中的应用和评估



支持知识表示、标准化和传播的链接数据和生物本体是开发生物和生物医学数据库的一个组成部分。也就是说,数据库中采用链接数据和生物本体来维护数据完整性、数据组织并增强搜索能力。然而,链接数据和生物本体最近被用来将信息表示为多关系异构图,即“知识图”。原因是,知识图中的实体和关系可以表示为语义空间中的嵌入向量,并且这些嵌入向量已用于预测实体之间的关系。这种知识图嵌入方法提供了一种实用的数据分析方法,并增加了构建具有高预测精度的机器学习模型的机会,从而增强了决策支持系统。在这里,我们提出了基于知识图的表示学习方法的比较评估和标准基准,重点关注生物关系的链接预测任务。我们根据用于训练和评估的设计设置系统地研究和比较了最先进的嵌入方法。我们进一步测试了旨在控制知识图中每个关系相关的信息量及其对最终性能的影响的各种策略。我们还通过聚类和可视化评估了知识图谱特征的质量,并采用了几种评估指标来检查它们的用途和差异。 基于这种系统的比较和评估,我们识别并讨论了基于知识图的表示学习方法的局限性,并为开发更多改进的方法提出了一些指导方针。
更新日期:2021-02-18
down
wechat
bug