当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Broad-coverage biomedical relation extraction with SemRep.
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-05-14 , DOI: 10.1186/s12859-020-3517-7
Halil Kilicoglu 1, 2 , Graciela Rosemblat 1 , Marcelo Fiszman 3 , Dongwook Shin 1
Affiliation  

BACKGROUND In the era of information overload, natural language processing (NLP) techniques are increasingly needed to support advanced biomedical information management and discovery applications. In this paper, we present an in-depth description of SemRep, an NLP system that extracts semantic relations from PubMed abstracts using linguistic principles and UMLS domain knowledge. We also evaluate SemRep on two datasets. In one evaluation, we use a manually annotated test collection and perform a comprehensive error analysis. In another evaluation, we assess SemRep's performance on the CDR dataset, a standard benchmark corpus annotated with causal chemical-disease relationships. RESULTS A strict evaluation of SemRep on our manually annotated dataset yields 0.55 precision, 0.34 recall, and 0.42 F 1 score. A relaxed evaluation, which more accurately characterizes SemRep performance, yields 0.69 precision, 0.42 recall, and 0.52 F 1 score. An error analysis reveals named entity recognition/normalization as the largest source of errors (26.9%), followed by argument identification (14%) and trigger detection errors (12.5%). The evaluation on the CDR corpus yields 0.90 precision, 0.24 recall, and 0.38 F 1 score. The recall and the F 1 score increase to 0.35 and 0.50, respectively, when the evaluation on this corpus is limited to sentence-bound relationships, which represents a fairer evaluation, as SemRep operates at the sentence level. CONCLUSIONS SemRep is a broad-coverage, interpretable, strong baseline system for extracting semantic relations from biomedical text. It also underpins SemMedDB, a literature-scale knowledge graph based on semantic relations. Through SemMedDB, SemRep has had significant impact in the scientific community, supporting a variety of clinical and translational applications, including clinical decision making, medical diagnosis, drug repurposing, literature-based discovery and hypothesis generation, and contributing to improved health outcomes. In ongoing development, we are redesigning SemRep to increase its modularity and flexibility, and addressing weaknesses identified in the error analysis.

中文翻译:

使用SemRep进行广泛的生物医学关系提取。

背景技术在信息过载的时代,越来越需要自然语言处理(NLP)技术来支持高级生物医学信息管理和发现应用程序。在本文中,我们将对SemRep(NLP系统)进行深入描述,该系统使用语言原理和UMLS领域知识从PubMed摘要中提取语义关系。我们还将在两个数据集上评估SemRep。在一项评估中,我们使用了手动注释的测试集合并执行了全面的错误分析。在另一项评估中,我们评估SemRep在CDR数据集上的性能,该数据集是用因果化学疾病关系注释的标准基准语料库。结果在我们手动注释的数据集上对SemRep进行了严格的评估,得出的精度为0.55,召回率为0.34,F 1得分为0.42。轻松的评估,可以更准确地表征SemRep的性能,产生0.69的精度,0.42的查全率和0.52的F 1分数。错误分析显示,命名的实体识别/标准化是错误的最大来源(26.9%),其次是参数识别(14%)和触发器检测错误(12.5%)。对CDR语料库的评估得出0.90精度,0.24回忆和0.38 F 1分数。当此语料库的评估仅限于句子约束的关系时,召回率和F 1分数分别提高到0.35和0.50,这代表了更合理的评估,因为SemRep在句子级别进行操作。结论SemRep是一个涵盖面广,可解释的强大基线系统,用于从生物医学文本中提取语义关系。它还为SemMedDB(基于语义关系的文学级知识图)提供了基础。通过SemMedDB,SemRep在科学界产生了重大影响,支持各种临床和转化应用,包括临床决策,医学诊断,药物替代用途,基于文献的发现和假设产生,并有助于改善健康状况。在正在进行的开发中,我们正在重新设计SemRep,以提高其模块化和灵活性,并解决错误分析中发现的弱点。
更新日期:2020-05-14
down
wechat
bug