当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Biomedical named entity recognition using deep neural networks with contextual information.
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2019-12-27 , DOI: 10.1186/s12859-019-3321-4
Hyejin Cho 1 , Hyunju Lee 1
Affiliation  

BACKGROUND In biomedical text mining, named entity recognition (NER) is an important task used to extract information from biomedical articles. Previously proposed methods for NER are dictionary- or rule-based methods and machine learning approaches. However, these traditional approaches are heavily reliant on large-scale dictionaries, target-specific rules, or well-constructed corpora. These methods to NER have been superseded by the deep learning-based approach that is independent of hand-crafted features. However, although such methods of NER employ additional conditional random fields (CRF) to capture important correlations between neighboring labels, they often do not incorporate all the contextual information from text into the deep learning layers. RESULTS We propose herein an NER system for biomedical entities by incorporating n-grams with bi-directional long short-term memory (BiLSTM) and CRF; this system is referred to as a contextual long short-term memory networks with CRF (CLSTM). We assess the CLSTM model on three corpora: the disease corpus of the National Center for Biotechnology Information (NCBI), the BioCreative II Gene Mention corpus (GM), and the BioCreative V Chemical Disease Relation corpus (CDR). Our framework was compared with several deep learning approaches, such as BiLSTM, BiLSTM with CRF, GRAM-CNN, and BERT. On the NCBI corpus, our model recorded an F-score of 85.68% for the NER of diseases, showing an improvement of 1.50% over previous methods. Moreover, although BERT used transfer learning by incorporating more than 2.5 billion words, our system showed similar performance with BERT with an F-scores of 81.44% for gene NER on the GM corpus and a outperformed F-score of 86.44% for the NER of chemicals and diseases on the CDR corpus. We conclude that our method significantly improves performance on biomedical NER tasks. CONCLUSION The proposed approach is robust in recognizing biological entities in text.

中文翻译:

使用具有上下文信息的深度神经网络,将生物医学命名为实体识别。

背景技术在生物医学文本挖掘中,命名实体识别(NER)是用于从生物医学文章中提取信息的重要任务。先前针对NER提出的方法是基于字典或规则的方法和机器学习方法。但是,这些传统方法严重依赖于大型词典,特定于目标的规则或结构良好的语料库。NER的这些方法已被基于深度学习的方法所取代,该方法独立于手工制作的功能。但是,尽管NER的此类方法采用了附加的条件随机字段(CRF)来捕获相邻标签之间的重要关联,但它们通常不会将来自文本的所有上下文信息并入深度学习层。结果我们在此提出了一种生物医学实体的NER系统,该系统通过将n-gram与双向长短期记忆(BiLSTM)和CRF结合在一起;该系统被称为具有CRF(CLSTM)的上下文长期短期存储网络。我们评估了三个语料库的CLSTM模型:国家生物技术信息中心(NCBI)的疾病语料库,BioCreative II基因提及语料库(GM)和BioCreative V化学疾病关系语料库(CDR)。我们的框架与几种深度学习方法进行了比较,例如BiLSTM,带有CRF的BiLSTM,GRAM-CNN和BERT。在NCBI语料库上,我们的模型记录的NER疾病F评分为85.68%,比以前的方法提高了1.50%。此外,尽管BERT使用转移学习的方式合并了25亿多个单词,我们的系统显示出与BERT相似的性能,GM语料库上的基因NER的F得分为81.44%,而CDR语料库上的化学和疾病NER的F得分为86.44%。我们得出的结论是,我们的方法显着提高了生物医学NER任务的性能。结论所提出的方法在识别文本中的生物实体方面具有鲁棒性。
更新日期:2019-12-30
down
wechat
bug