当前位置: X-MOL 学术J. Biomed. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
From electronic health records to terminology base: A novel knowledge base enrichment approach
Journal of Biomedical informatics ( IF 4.0 ) Pub Date : 2020-11-21 , DOI: 10.1016/j.jbi.2020.103628
Jiaying Zhang 1 , Zhixing Zhang 1 , Huanhuan Zhang 1 , Zhiyuan Ma 1 , Qi Ye 1 , Ping He 2 , Yangming Zhou 1
Affiliation  

Enriching terminology base (TB) is an important and continuous process, since formal term can be renamed and new term alias emerges all the time. As a potential supplementary for TB enrichment, electronic health record (EHR) is a fundamental source for clinical research and practise. The task to align the set of external terms in EHRs to TB can be regarded as entity alignment without structure information. Conventional approaches mainly use internal structural information of multiple knowledge bases (KBs) to map entities and their counterparts among KBs. However, the external terms in EHRs are independent clinical terms, which lack of interrelations. To achieve entity alignment in this case, we proposed a novel automatic TB enrichment approach, named semantic & structure embeddings-based relevancy prediction (S2ERP). To obtain the semantic embedding of external terms, we fed them with formal entity into a pre-trained language model. Meanwhile, a graph convolutional network was used to obtain the structure embeddings of the synonyms and hyponyms in TB. Afterwards, S2ERP combines both embeddings to measure the relevancy. Experimental results on clinical indicator TB, collected from 38 top-class hospitals of Shanghai Hospital Development Center, showed that the proposed approach outperforms baseline methods by 14.16% in Hits@1.



中文翻译:

从电子健康记录到术语库:一种新颖的知识库丰富方法

扩展术语库(TB)是一个重要且连续的过程,因为可以重命名正式术语,并且始终都有新的术语别名出现。电子病历(EHR)作为结核病富集的潜在补充,是临床研究和实践的基本来源。将EHR中的外部术语与TB对齐的任务可以视为没有结构信息的实体对齐。常规方法主要使用多个知识库(KB)的内部结构信息来映射实体及其在KB之间的对应物。但是,EHR中的外部术语是独立的临床术语,缺乏相互关系。为了在这种情况下实现实体对齐,我们提出了一种新颖的自动TB富集方法,名为基于语义和结构嵌入的相关性预测(S2ERP)。为了获得外部术语的语义嵌入,我们将它们与形式实体一起馈入了预先训练的语言模型中。同时,利用图卷积网络获得了TB中同义词和下义词的结构嵌入。之后,S2ERP将两个嵌入组合在一起以衡量相关性。从上海医院发展中心的38家一流医院收集的临床指标TB的实验结果表明,该方法在2004年的性能比基准方法高出14.16%。Hits @ 1

更新日期:2020-11-21
down
wechat
bug