当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
LATTE: Latent Type Modeling for Biomedical Entity Linking
arXiv - CS - Information Retrieval Pub Date : 2019-11-21 , DOI: arxiv-1911.09787
Ming Zhu, Busra Celikkaya, Parminder Bhatia, Chandan K. Reddy

Entity linking is the task of linking mentions of named entities in natural language text, to entities in a curated knowledge-base. This is of significant importance in the biomedical domain, where it could be used to semantically annotate a large volume of clinical records and biomedical literature, to standardized concepts described in an ontology such as Unified Medical Language System (UMLS). We observe that with precise type information, entity disambiguation becomes a straightforward task. However, fine-grained type information is usually not available in biomedical domain. Thus, we propose LATTE, a LATent Type Entity Linking model, that improves entity linking by modeling the latent fine-grained type information about mentions and entities. Unlike previous methods that perform entity linking directly between the mentions and the entities, LATTE jointly does entity disambiguation, and latent fine-grained type learning, without direct supervision. We evaluate our model on two biomedical datasets: MedMentions, a large scale public dataset annotated with UMLS concepts, and a de-identified corpus of dictated doctor's notes that has been annotated with ICD concepts. Extensive experimental evaluation shows our model achieves significant performance improvements over several state-of-the-art techniques.

中文翻译:

LATTE:生物医学实体链接的潜在类型建模

实体链接是将自然语言文本中命名实体的提及链接到策划知识库中的实体的任务。这在生物医学领域具有重要意义,它可用于对大量临床记录和生物医学文献进行语义注释,以语义化为本体中描述的标准化概念,例如统一医学语言系统 (UMLS)。我们观察到,通过精确的类型信息,实体消歧成为一项简单的任务。然而,细粒度的类型信息在生物医学领域通常是不可用的。因此,我们提出了 LATTE,一种 LATent 类型实体链接模型,它通过对有关提及和实体的潜在细粒度类型信息进行建模来改进实体链接。与之前在提及和实体之间直接执行实体链接的方法不同,LATTE 联合进行实体消歧和潜在细粒度类型学习,无需直接监督。我们在两个生物医学数据集上评估我们的模型:MedMentions,一个用 UMLS 概念注释的大规模公共数据集,以及一个用 ICD 概念注释的口述医生笔记的去识别语料库。广泛的实验评估表明,我们的模型比几种最先进的技术实现了显着的性能改进。
更新日期:2020-01-22
down
wechat
bug