当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
OGER++: hybrid multi-type entity recognition.
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2019-01-21 , DOI: 10.1186/s13321-018-0326-3
Lenz Furrer 1 , Anna Jancso 1 , Nicola Colic 1 , Fabio Rinaldi 1, 2
Affiliation  

We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a normalization method for matching spelling variants. The disambiguation classifier is implemented as a feed-forward neural network which acts as a postfilter to the previous step. We evaluated the system in terms of processing speed and annotation quality. In the speed benchmarks, the OGER++ web service processes 9.7 abstracts or 0.9 full-text documents per second. On the CRAFT corpus, we achieved 71.4% and 56.7% F1 for named entity recognition and concept recognition, respectively. Combining knowledge-based and data-driven components allows creating a system with competitive performance in biomedical text mining.

中文翻译:

OGER ++:混合型多类型实体识别。

我们提供了一种文本挖掘工具,用于识别科学文献中的生物医学实体。OGER ++是用于命名实体识别和概念识别(链接)的混合系统,它将基于字典的注释器与基于语料库的消歧组件相结合。注释器将有效的查找策略与归一化方法结合使用,以匹配拼写变体。消歧分类器实现为前馈神经网络,它充当上一步的后过滤器。我们根据处理速度和注释质量对系统进行了评估。在速度基准测试中,OGER ++ Web服务每秒处理9.7个摘要或0.9个全文文档。在CRAFT语料库上,命名实体识别和概念识别的F1分别达到71.4%和56.7%。
更新日期:2019-01-21
down
wechat
bug