NLPContributions: An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature,arXiv - CS - Digital Libraries

当前位置： X-MOL 学术 › arXiv.cs.DL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

NLPContributions: An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature
arXiv - CS - Digital Libraries Pub Date : 2020-06-23 , DOI: arxiv-2006.12870
Jennifer D'Souza and S\"oren Auer

We describe an annotation initiative to capture the scholarly contributions in natural language processing (NLP) articles, particularly, for the articles that discuss machine learning (ML) approaches for various information extraction tasks. We develop the annotation task based on a pilot annotation exercise on 50 NLP-ML scholarly articles presenting contributions to five information extraction tasks 1. machine translation, 2. named entity recognition, 3. question answering, 4. relation classification, and 5. text classification. In this article, we describe the outcomes of this pilot annotation phase. Through the exercise we have obtained an annotation methodology; and found ten core information units that reflect the contribution of the NLP-ML scholarly investigations. The resulting annotation scheme we developed based on these information units is called NLPContributions. The overarching goal of our endeavor is four-fold: 1) to find a systematic set of patterns of subject-predicate-object statements for the semantic structuring of scholarly contributions that are more or less generically applicable for NLP-ML research articles; 2) to apply the discovered patterns in the creation of a larger annotated dataset for training machine readers of research contributions; 3) to ingest the dataset into the Open Research Knowledge Graph (ORKG) infrastructure as a showcase for creating user-friendly state-of-the-art overviews; 4) to integrate the machine readers into the ORKG to assist users in the manual curation of their respective article contributions. We envision that the NLPContributions methodology engenders a wider discussion on the topic toward its further refinement and development. Our pilot annotated dataset of 50 NLP-ML scholarly articles according to the NLPContributions scheme is openly available to the research community at https://doi.org/10.25835/0019761.

中文翻译：

NLPContributions：机器阅读自然语言处理文献中学术贡献的注释方案

我们描述了一个注释计划，以捕捉自然语言处理 (NLP) 文章中的学术贡献，特别是讨论用于各种信息提取任务的机器学习 (ML) 方法的文章。我们基于 50 篇 NLP-ML 学术文章的试点注释练习开发了注释任务，这些文章展示了对五个信息提取任务的贡献：1.机器翻译、2.命名实体识别、3.问答、4.关系分类和 5.文本分类。在本文中，我们描述了此试点注释阶段的结果。通过练习，我们获得了一种注释方法；并发现了十个反映 NLP-ML 学术研究贡献的核心信息单元。我们基于这些信息单元开发的最终注释方案称为 NLPContributions。我们努力的首要目标有四个：1）找到一套系统的主谓宾陈述模式，用于学术贡献的语义结构，或多或少普遍适用于 NLP-ML 研究文章；2) 将发现的模式应用于创建更大的带注释的数据集，以训练有研究贡献的机器读者；3) 将数据集摄取到开放研究知识图 (ORKG) 基础设施中，作为创建用户友好的最先进概述的展示；4) 将机器阅读器集成到 ORKG 中，以帮助用户对其各自的文章贡献进行手动策展。我们设想 NLPContributions 方法会对该主题进行更广泛的讨论，以进一步完善和发展。根据 NLPContributions 计划，我们的 50 篇 NLP-ML 学术文章的试点注释数据集可在 https://doi.org/10.25835/0019761 上向研究社区公开。

更新日期：2020-09-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>