Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)-based ranking for concept normalization.,Journal of the American Medical Informatics Association

当前位置： X-MOL 学术 › J. Am. Med. Inform. Assoc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)-based ranking for concept normalization.
Journal of the American Medical Informatics Association ( IF 6.4 ) Pub Date : 2020-07-27 , DOI: 10.1093/jamia/ocaa080
Dongfang Xu ₁ , Manoj Gopale ₂ , Jiacheng Zhang ₃ , Kris Brown ₄ , Edmon Begoli ₄ , Steven Bethard ₁

Affiliation

Abstract

Objective

Concept normalization, the task of linking phrases in text to concepts in an ontology, is useful for many downstream tasks including relation extraction, information retrieval, etc. We present a generate-and-rank concept normalization system based on our participation in the 2019 National NLP Clinical Challenges Shared Task Track 3 Concept Normalization.

Materials and Methods

The shared task provided 13 609 concept mentions drawn from 100 discharge summaries. We first design a sieve-based system that uses Lucene indices over the training data, Unified Medical Language System (UMLS) preferred terms, and UMLS synonyms to generate a list of possible concepts for each mention. We then design a listwise classifier based on the BERT (Bidirectional Encoder Representations from Transformers) neural network to rank the candidate concepts, integrating UMLS semantic types through a regularizer.

Results

Our generate-and-rank system was third of 33 in the competition, outperforming the candidate generator alone (81.66% vs 79.44%) and the previous state of the art (76.35%). During postevaluation, the model’s accuracy was increased to 83.56% via improvements to how training data are generated from UMLS and incorporation of our UMLS semantic type regularizer.

Discussion

Analysis of the model shows that prioritizing UMLS preferred terms yields better performance, that the UMLS semantic type regularizer results in qualitatively better concept predictions, and that the model performs well even on concepts not seen during training.

Conclusions

Our generate-and-rank framework for UMLS concept normalization integrates key UMLS features like preferred terms and semantic types with a neural network–based ranking model to accurately link phrases in text to UMLS concepts.

中文翻译：

统一的医学语言系统资源改善了基于筛子的生成，并基于基于变压器（BERT）的排名对双向编码器表示进行了归一化。

摘要

目的

概念规范化是将文本中的短语链接到本体中的概念的任务，对于许多下游任务（包括关系提取，信息检索等）很有用。基于我们参加2019年国家会议，我们提出了一种生成和排序的概念规范化系统NLP临床挑战共享任务跟踪3概念规范化。

材料和方法

共享的任务从100个排放摘要中提供了13609个概念提及。我们首先设计一个基于筛子的系统，该系统在训练数据上使用Lucene索引，统一医学语言系统（UMLS）首选术语和UMLS同义词来为每次提及生成可能的概念列表。然后，我们基于BERT（来自变压器的双向编码器表示）神经网络设计一个按列表分类器，对候选概念进行排名，并通过正则化器集成UMLS语义类型。

结果

我们的生成和排名系统在竞争中排名33，位居第三，分别超过了候选生成器（81.66％对79.44％）和先前的最先进技术（76.35％）。在后评估期间，通过改进从UMLS生成训练数据的方式以及合并我们的UMLS语义类型正则化工具，模型的准确性提高到83.56％。

讨论区

对模型的分析表明，对UMLS首选术语进行优先级排序可产生更好的性能，UMLS语义类型正则化程序可在质量上更好地进行概念预测，并且该模型即使在训练期间未看到的概念上也能表现良好。

结论

我们用于UMLS概念规范化的生成和排序框架将关键的UMLS功能（如首选术语和语义类型）与基于神经网络的排名模型相集成，以将文本中的短语准确链接到UMLS概念。

更新日期：2020-10-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南