Journal of Biomedical informatics ( IF 4.5 ) Pub Date : 2020-10-26 , DOI: 10.1016/j.jbi.2020.103609 Sudhakaran Gajendran 1 , Manjula D 1 , Vijayan Sugumaran 2
Named Entity Recognition is the process of identifying different entities in a given context. Biomedical Named Entity Recognition (BNER) is the task of extracting chemical names from biomedical texts to support biomedical and translational research. The aim of the system is to extract useful chemical names from biomedical literature text without a lot of handcrafted engineering features. This approach introduces a novel neural network architecture with the composition of bidirectional long short-term memory (BLSTM), dynamic recurrent neural network (RNN) and conditional random field (CRF) that uses character level and word level embedding as the only features to identify the chemical entities. Using this approach we have achieved the F1 score of 89.98 on BioCreAtIvE II GM corpus and 90.84 on NCBI corpus by outperforming the existing systems. Our system is based on the deep neural architecture that uses both character and word level embedding which captures the morphological and orthographic information eliminating the need for handcrafted engineering features. The proposed system outperforms the existing systems without a lot of handcrafted engineering features. The embedding concept along with the bidirectional LSTM network proved to be an effective method to identify most of the chemical entities.
中文翻译:
双向LSTM的字符级和词级嵌入–动态递归神经网络,用于从文献中识别生物医学命名实体
命名实体识别是在给定上下文中标识不同实体的过程。生物医学命名实体识别(BNER)是从生物医学文本中提取化学名称的任务,以支持生物医学和翻译研究。该系统的目的是从生物医学文献文本中提取有用的化学名称,而无需大量的手工工程功能。这种方法引入了一种新颖的神经网络体系结构,该体系结构由双向长短期记忆(BLSTM),动态递归神经网络(RNN)和条件随机域(CRF)组成,该体系使用字符级和单词级嵌入作为唯一识别特征化学实体。通过使用这种方法,我们通过优于现有系统,在BioCreAtIvE II GM语料库上获得了F1评分89.98,在NCBI语料库上得到了90.84。我们的系统基于使用字符和单词级嵌入的深层神经体系结构,可捕获形态学和正交信息,从而无需手工工程特征。所提出的系统在没有很多手工工程特征的情况下优于现有系统。嵌入概念以及双向LSTM网络被证明是识别大多数化学实体的有效方法。