A neural network approach to chemical and gene/protein entity recognition in patents.,Journal of Cheminformatics

当前位置： X-MOL 学术 › J. Cheminfom. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A neural network approach to chemical and gene/protein entity recognition in patents.
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2018-12-18 , DOI: 10.1186/s13321-018-0318-3
Ling Luo ₁ , Zhihao Yang ₁ , Pei Yang ₁ , Yin Zhang ₂ , Lei Wang ₂ , Jian Wang ₁ , Hongfei Lin ₁

Affiliation

In biomedical research, patents contain the significant amount of information, and biomedical text mining has received much attention in patents recently. To accelerate the development of biomedical text mining for patents, the BioCreative V.5 challenge organized three tracks, i.e., chemical entity mention recognition (CEMP), gene and protein related object recognition (GPRO) and technical interoperability and performance of annotation servers, to focus on biomedical entity recognition in patents. This paper describes our neural network approach for the CEMP and GPRO tracks. In the approach, a bidirectional long short-term memory with a conditional random field layer is employed to recognize biomedical entities from patents. To improve the performance, we explored the effect of additional features (i.e., part of speech, chunking and named entity recognition features generated by the GENIA tagger) for the neural network model. In the official results, our best runs achieve the highest performances (a precision of 88.32%, a recall of 92.62%, and an F-score of 90.42% in the CEMP track; a precision of 76.65%, a recall of 81.91%, and an F-score of 79.19% in the GPRO track) among all participating teams in both tracks.

中文翻译：

用于专利中化学和基因/蛋白质实体识别的神经网络方法。

在生物医学研究中，专利包含大量信息，而生物医学文本挖掘最近在专利中引起了很多关注。为了加快专利生物医学文本挖掘的发展，BioCreative V.5挑战赛组织了三个方面，即化学实体提及识别（CEMP），基因和蛋白质相关对象识别（GPRO）以及注释服务器的技术互操作性和性能，专注于专利中的生物医学实体识别。本文介绍了针对CEMP和GPRO轨道的神经网络方法。在该方法中，具有条件随机场层的双向长期短期记忆被用于识别专利中的生物医学实体。为了提高效果，我们探索了附加功能（例如词性，GENIA标签生成器为神经网络模型生成的分块和命名实体识别功能。在官方成绩中，我们的最佳成绩达到了最高的性能（在CEMP赛道中，准确率达到88.32％，召回率达到92.62％，F评分达到90.42％；准确率达到76.65％，召回率达到81.91％，在这两个赛道的所有参赛队伍中，GPRO赛道的F得分为79.19％。

更新日期：2018-12-18

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>