Domain-independent Extraction of Scientific Concepts from Research Articles,arXiv - CS - Digital Libraries

当前位置： X-MOL 学术 › arXiv.cs.DL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Domain-independent Extraction of Scientific Concepts from Research Articles
arXiv - CS - Digital Libraries Pub Date : 2020-01-09 , DOI: arxiv-2001.03067
Arthur Brack, Jennifer D'Souza, Anett Hoppe, S\"oren Auer, Ralph Ewerth

We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present two deep learning systems as baselines. In particular, we propose active learning to deal with different domains in our task. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.

中文翻译：

从研究文章中独立于领域提取科学概念

我们研究了从学术文章摘要中提取与领域无关的科学概念的新任务，并提出了两个贡献。首先，我们提出了一组在系统注释过程中确定的通用科学概念。在与领域专家的共同努力下，这组概念用于在短语级别注释来自科学、技术和医学的 10 个领域的科学摘要语料库。所得数据集用于一组基准实验，以 (a) 为该任务提供基线性能，(b) 检查域之间概念的可转移性。其次，我们将两个深度学习系统作为基线。特别是，我们建议主动学习来处理我们任务中的不同领域。

更新日期：2020-05-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>