当前位置:
X-MOL 学术
›
arXiv.cs.DL
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Domain-independent Extraction of Scientific Concepts from Research Articles
arXiv - CS - Digital Libraries Pub Date : 2020-01-09 , DOI: arxiv-2001.03067 Arthur Brack, Jennifer D'Souza, Anett Hoppe, S\"oren Auer, Ralph Ewerth
arXiv - CS - Digital Libraries Pub Date : 2020-01-09 , DOI: arxiv-2001.03067 Arthur Brack, Jennifer D'Souza, Anett Hoppe, S\"oren Auer, Ralph Ewerth
We examine the novel task of domain-independent scientific concept extraction
from abstracts of scholarly articles and present two contributions. First, we
suggest a set of generic scientific concepts that have been identified in a
systematic annotation process. This set of concepts is utilised to annotate a
corpus of scientific abstracts from 10 domains of Science, Technology and
Medicine at the phrasal level in a joint effort with domain experts. The
resulting dataset is used in a set of benchmark experiments to (a) provide
baseline performance for this task, (b) examine the transferability of concepts
between domains. Second, we present two deep learning systems as baselines. In
particular, we propose active learning to deal with different domains in our
task. The experimental results show that (1) a substantial agreement is
achievable by non-experts after consultation with domain experts, (2) the
baseline system achieves a fairly high F1 score, (3) active learning enables us
to nearly halve the amount of required training data.
中文翻译:
从研究文章中独立于领域提取科学概念
我们研究了从学术文章摘要中提取与领域无关的科学概念的新任务,并提出了两个贡献。首先,我们提出了一组在系统注释过程中确定的通用科学概念。在与领域专家的共同努力下,这组概念用于在短语级别注释来自科学、技术和医学的 10 个领域的科学摘要语料库。所得数据集用于一组基准实验,以 (a) 为该任务提供基线性能,(b) 检查域之间概念的可转移性。其次,我们将两个深度学习系统作为基线。特别是,我们建议主动学习来处理我们任务中的不同领域。
更新日期:2020-05-22
中文翻译:
从研究文章中独立于领域提取科学概念
我们研究了从学术文章摘要中提取与领域无关的科学概念的新任务,并提出了两个贡献。首先,我们提出了一组在系统注释过程中确定的通用科学概念。在与领域专家的共同努力下,这组概念用于在短语级别注释来自科学、技术和医学的 10 个领域的科学摘要语料库。所得数据集用于一组基准实验,以 (a) 为该任务提供基线性能,(b) 检查域之间概念的可转移性。其次,我们将两个深度学习系统作为基线。特别是,我们建议主动学习来处理我们任务中的不同领域。