当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploiting sequence labeling framework to extract document-level relations from biomedical texts.
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2020-03-27 , DOI: 10.1186/s12859-020-3457-2
Zhiheng Li 1 , Zhihao Yang 1 , Yang Xiang 2 , Ling Luo 1 , Yuanyuan Sun 1 , Hongfei Lin 1
Affiliation  

BACKGROUND Both intra- and inter-sentential semantic relations in biomedical texts provide valuable information for biomedical research. However, most existing methods either focus on extracting intra-sentential relations and ignore inter-sentential ones or fail to extract inter-sentential relations accurately and regard the instances containing entity relations as being independent, which neglects the interactions between relations. We propose a novel sequence labeling-based biomedical relation extraction method named Bio-Seq. In the method, sequence labeling framework is extended by multiple specified feature extractors so as to facilitate the feature extractions at different levels, especially at the inter-sentential level. Besides, the sequence labeling framework enables Bio-Seq to take advantage of the interactions between relations, and thus, further improves the precision of document-level relation extraction. RESULTS Our proposed method obtained an F1-score of 63.5% on BioCreative V chemical disease relation corpus, and an F1-score of 54.4% on inter-sentential relations, which was 10.5% better than the document-level classification baseline. Also, our method achieved an F1-score of 85.1% on n2c2-ADE sub-dataset. CONCLUSION Sequence labeling method can be successfully used to extract document-level relations, especially for boosting the performance on inter-sentential relation extraction. Our work can facilitate the research on document-level biomedical text mining.

中文翻译:

利用序列标记框架从生物医学文本中提取文档级关系。

背景技术生物医学文本中的句子内和句子间语义关系都为生物医学研究提供了有价值的信息。然而,大多数现有方法要么专注于提取句内关系而忽略句间关系,要么无法准确地提取句间关系,而将包含实体关系的实例视为独立的,从而忽略了关系之间的相互作用。我们提出了一种新的基于序列标记的生物医学关系提取方法,称为Bio-Seq。在该方法中,序列标记框架由多个指定的特征提取器扩展,以便于在不同级别,尤其是在句间级别上的特征提取。此外,序列标记框架使Bio-Seq能够利用关系之间的相互作用,因此,进一步提高了文档级关系提取的精度。结果我们提出的方法在BioCreative V化学疾病相关语料库上的F1得分为63.5%,在句子间关系上的F1得分为54.4%,比文档级分类基线高10.5%。同样,我们的方法在n2c2-ADE子数据集上实现了F1得分85.1%。结论序列标记方法可以成功地用于提取文档级别的关系,特别是可以提高句子间关系提取的性能。我们的工作可以促进对文档级生物医学文本挖掘的研究。句间关系占4%,比文档级分类基准好10.5%。同样,我们的方法在n2c2-ADE子数据集上实现了F1得分85.1%。结论序列标记方法可以成功地用于提取文档级别的关系,特别是可以提高句子间关系提取的性能。我们的工作可以促进对文档级生物医学文本挖掘的研究。句间关系占4%,比文档级分类基准好10.5%。同样,我们的方法在n2c2-ADE子数据集上的F1得分达到85.1%。结论序列标记方法可以成功地用于提取文档级别的关系,特别是可以提高句子间关系提取的性能。我们的工作可以促进对文档级生物医学文本挖掘的研究。
更新日期:2020-04-22
down
wechat
bug