BioRel: towards large-scale biomedical relation extraction,BMC Bioinformatics

当前位置： X-MOL 学术 › BMC Bioinform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

BioRel: towards large-scale biomedical relation extraction
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2020-12-16 , DOI: 10.1186/s12859-020-03889-5
Rui Xing , Jie Luo , Tengwei Song

Although biomedical publications and literature are growing rapidly, there still lacks structured knowledge that can be easily processed by computer programs. In order to extract such knowledge from plain text and transform them into structural form, the relation extraction problem becomes an important issue. Datasets play a critical role in the development of relation extraction methods. However, existing relation extraction datasets in biomedical domain are mainly human-annotated, whose scales are usually limited due to their labor-intensive and time-consuming nature. We construct BioRel, a large-scale dataset for biomedical relation extraction problem, by using Unified Medical Language System as knowledge base and Medline as corpus. We first identify mentions of entities in sentences of Medline and link them to Unified Medical Language System with Metamap. Then, we assign each sentence a relation label by using distant supervision. Finally, we adapt the state-of-the-art deep learning and statistical machine learning methods as baseline models and conduct comprehensive experiments on the BioRel dataset. Based on the extensive experimental results, we have shown that BioRel is a suitable large-scale datasets for biomedical relation extraction, which provides both reasonable baseline performance and many remaining challenges for both deep learning and statistical methods.

中文翻译：

BioRel：走向大规模生物医学关系提取

尽管生物医学出版物和文献发展迅速，但仍然缺乏可以由计算机程序轻松处理的结构化知识。为了从纯文本中提取此类知识并将其转换为结构形式，关系提取问题成为一个重要的问题。数据集在关系提取方法的开发中起着至关重要的作用。然而，生物医学领域中现有的关系提取数据集主要是人类注释的，由于其劳动强度大且耗时的性质，其规模通常受到限制。我们以统一医学语言系统为知识库，以Medline为语料库，构建了生物医学关系抽取问题的大规模数据集BioRel。我们首先确定Medline句子中提到的实体，然后将它们链接到带有Metamap的统一医学语言系统。然后，我们通过远程监督为每个句子分配一个关系标签。最后，我们将最先进的深度学习和统计机器学习方法用作基准模型，并对BioRel数据集进行全面的实验。基于广泛的实验结果，我们已经证明BioRel是适合生物医学关系提取的大规模数据集，它提供了合理的基线性能以及深度学习和统计方法的许多剩余挑战。我们采用最先进的深度学习和统计机器学习方法作为基准模型，并在BioRel数据集上进行全面的实验。基于广泛的实验结果，我们已经证明BioRel是适合生物医学关系提取的大规模数据集，它提供了合理的基线性能以及深度学习和统计方法的许多剩余挑战。我们采用最先进的深度学习和统计机器学习方法作为基准模型，并在BioRel数据集上进行全面的实验。基于广泛的实验结果，我们已经证明BioRel是适合生物医学关系提取的大规模数据集，它提供了合理的基线性能以及深度学习和统计方法的许多剩余挑战。

更新日期：2020-12-16

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11