当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Clinical Relation Extraction Using Transformer-based Models
arXiv - CS - Information Retrieval Pub Date : 2021-07-19 , DOI: arxiv-2107.08957
Xi Yang, Zehao Yu, Yi Guo, Jiang Bian, Yonghui Wu

The newly emerged transformer technology has a tremendous impact on NLP research. In the general English domain, transformer-based models have achieved state-of-the-art performances on various NLP benchmarks. In the clinical domain, researchers also have investigated transformer models for clinical applications. The goal of this study is to systematically explore three widely used transformer-based models (i.e., BERT, RoBERTa, and XLNet) for clinical relation extraction and develop an open-source package with clinical pre-trained transformer-based models to facilitate information extraction in the clinical domain. We developed a series of clinical RE models based on three transformer architectures, namely BERT, RoBERTa, and XLNet. We evaluated these models using 2 publicly available datasets from 2018 MADE1.0 and 2018 n2c2 challenges. We compared two classification strategies (binary vs. multi-class classification) and investigated two approaches to generate candidate relations in different experimental settings. In this study, we compared three transformer-based (BERT, RoBERTa, and XLNet) models for relation extraction. We demonstrated that the RoBERTa-clinical RE model achieved the best performance on the 2018 MADE1.0 dataset with an F1-score of 0.8958. On the 2018 n2c2 dataset, the XLNet-clinical model achieved the best F1-score of 0.9610. Our results indicated that the binary classification strategy consistently outperformed the multi-class classification strategy for clinical relation extraction. Our methods and models are publicly available at https://github.com/uf-hobi-informatics-lab/ClinicalTransformerRelationExtraction. We believe this work will improve current practice on clinical relation extraction and other related NLP tasks in the biomedical domain.

中文翻译:

使用基于 Transformer 的模型进行临床关系提取

新出现的变压器技术对 NLP 研究产生了巨大的影响。在通用英语领域,基于转换器的模型在各种 NLP 基准测试中都取得了最先进的性能。在临床领域,研究人员还研究了临床应用的变压器模型。本研究的目标是系统地探索三种广泛使用的基于转换器的模型(即 BERT、RoBERTa 和 XLNet)用于临床关系提取,并开发具有临床预训练的基于转换器的模型的开源包,以促进信息提取在临床领域。我们开发了一系列基于三种变压器架构的临床 RE 模型,即 BERT、RoBERTa 和 XLNet。我们使用来自 2018 MADE1.0 和 2018 n2c2 挑战的 2 个公开可用数据集评估了这些模型。我们比较了两种分类策略(二元分类与多类分类),并研究了在不同实验设置中生成候选关系的两种方法。在这项研究中,我们比较了三种基于转换器(BERT、RoBERTa 和 XLNet)的关系提取模型。我们证明了 RoBERTa 临床 RE 模型在 2018 MADE1.0 数据集上取得了最佳性能,F1 分数为 0.8958。在 2018 n2c2 数据集上,XLNet 临床模型取得了 0.9610 的最佳 F1 分数。我们的结果表明,二元分类策略在临床关系提取方面始终优于多类分类策略。我们的方法和模型可在 https://github.com/uf-hobi-informatics-lab/ClinicalTransformerRelationExtraction 上公开获得。
更新日期:2021-07-20
down
wechat
bug