Coreference Resolution in Research Papers from Multiple Domains,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Coreference Resolution in Research Papers from Multiple Domains
arXiv - CS - Information Retrieval Pub Date : 2021-01-04 , DOI: arxiv-2101.00884
Arthur Brack, Daniel Uwe Müller, Anett Hoppe, Ralph Ewerth

Coreference resolution is essential for automatic text understanding to facilitate high-level information retrieval tasks such as text summarisation or question answering. Previous work indicates that the performance of state-of-the-art approaches (e.g. based on BERT) noticeably declines when applied to scientific papers. In this paper, we investigate the task of coreference resolution in research papers and subsequent knowledge graph population. We present the following contributions: (1) We annotate a corpus for coreference resolution that comprises 10 different scientific disciplines from Science, Technology, and Medicine (STM); (2) We propose transfer learning for automatic coreference resolution in research papers; (3) We analyse the impact of coreference resolution on knowledge graph (KG) population; (4) We release a research KG that is automatically populated from 55,485 papers in 10 STM domains. Comprehensive experiments show the usefulness of the proposed approach. Our transfer learning approach considerably outperforms state-of-the-art baselines on our corpus with an F1 score of 61.4 (+11.0), while the evaluation against a gold standard KG shows that coreference resolution improves the quality of the populated KG significantly with an F1 score of 63.5 (+21.8).

中文翻译：

来自多个领域的研究论文中的共指解析

共指解析对于自动理解文本至关重要，以促进高级信息检索任务，例如文本摘要或问题解答。先前的工作表明，当应用于科学论文时，最新方法（例如基于BERT）的性能会明显下降。在本文中，我们研究了研究论文和后续知识图群体中共指称解析的任务。我们提出以下贡献：（1）我们为共同引用解决方案注释一个语料库，该语料库包含科学，技术和医学（STM）的10个不同的科学学科；（2）我们建议在研究论文中使用转移学习来实现自动共指解析；（3）分析了共指分解对知识图谱（KG）群体的影响；（4）我们发布了一个研究KG，该KG由10个STM域中的55,485篇论文自动填充。综合实验证明了该方法的有效性。我们的迁移学习方法的F1分数达到61.4（+11.0），大大超过了我们语料库的最新基线，而对黄金标准KG的评估表明，共指分辨率显着提高了填充KG的质量。 F1得分63.5（+21.8）。

更新日期：2021-01-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>