Disease named entity recognition using long–short dependencies,Journal of Bioinformatics and Computational Biology

当前位置： X-MOL 学术 › J. Bioinform. Comput. Biol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Disease named entity recognition using long–short dependencies
Journal of Bioinformatics and Computational Biology ( IF 0.9 ) Pub Date : 2020-02-24 , DOI: 10.1142/s0219720020500158
Houssemeddine Derbel ₁ , Anja Habacha Chaibi ₁ , Henda Hajjami Ben Ghezala ₁

Affiliation

The automatic extraction of disease named entity is a challenging research problem that has attracted attention from the biomedical text mining community. Handcrafted feature methods were employed for this task given a little success since they are limited by the scope of the expert. Lately, deep learning-based methods have been employed to solve this issue. However, most architectures used for this task take into consideration long dependencies only. The proposed method is a two-stage deep neural network model. We start by discovering local dependencies and creating high-level features from word embedding inputs using a deep convolutional neural network. Then we identify long dependencies using a bi-directional recurrent neural network. To solve the problem of unbalanced dataset given by the BMEWO tagging schema and to enforce sequence modeling, we developed a new POS-based tagging schema that subdivides the dominant class into smaller more balanced units. The proposed system was trained and tested on NCBI and achieved an [Formula: see text]-score of 85.59 outperforming the current state-of-the-art methods. Our research results show the effectiveness of using both long and short dependencies. The results also illustrate the benefits of combining different word embedding techniques and the incorporation of morphological features in this task.

中文翻译：

使用长短依赖的疾病命名实体识别

疾病命名实体的自动提取是一个具有挑战性的研究问题，引起了生物医学文本挖掘界的关注。由于受到专家范围的限制，因此在此任务中采用了手工制作的特征方法，取得了一些成功。最近，已经采用基于深度学习的方法来解决这个问题。但是，用于此任务的大多数体系结构仅考虑长依赖关系。所提出的方法是一个两阶段的深度神经网络模型。我们首先使用深度卷积神经网络发现局部依赖关系并从词嵌入输入创建高级特征。然后我们使用双向循环神经网络识别长依赖关系。为了解决 BMEWO 标记模式给出的数据集不平衡问题并强制执行序列建模，我们开发了一种新的基于 POS 的标记模式，将主导类细分为更小更平衡的单元。所提出的系统在 NCBI 上进行了训练和测试，并取得了 85.59 的 [公式：见文本] 分数，优于当前最先进的方法。我们的研究结果显示了同时使用长依赖和短依赖的有效性。结果还说明了在这项任务中结合不同的词嵌入技术和形态特征的结合的好处。

更新日期：2020-02-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11