当前位置: X-MOL 学术ACM Comput. Surv. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension
ACM Computing Surveys ( IF 16.6 ) Pub Date : 2023-02-02 , DOI: 10.1145/3560260
Anna Rogers 1 , Matt Gardner 2 , Isabelle Augenstein 3
Affiliation  

Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with more than 80 new datasets appearing in the past 2 years. This study is the largest survey of the field to date. We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work. We further discuss the current classifications of “skills” that question answering/reading comprehension systems are supposed to acquire and propose a new taxonomy. The supplementary materials survey the current multilingual resources and monolingual resources for languages other than English, and we discuss the implications of overfocusing on English. The study is aimed at both practitioners looking for pointers to the wealth of existing data and at researchers working on new resources.



中文翻译:

QA 数据集爆炸:用于问答和阅读理解的 NLP 资源分类

近年来,除了对 NLP 中的深度学习模型进行大量研究外,在跟踪建模进度所需的基准数据集方面也进行了大量工作。问答和阅读理解在这方面特别多产,过去 2 年出现了 80 多个新数据集。这项研究是迄今为止该领域最大规模的调查。我们提供了当前资源的各种格式和领域的概述,突出了未来工作的当前空白。我们进一步讨论了当前的问题回答/阅读理解系统应该获得的“技能”分类,并提出了一个新的分类法。补充材料调查了当前的多语言资源和非英语语言的单语言资源,我们讨论了过度关注英语的影响。该研究既针对从业者寻找指向现有数据财富的指针,也针对研究新资源的研究人员。

更新日期:2023-02-02
down
wechat
bug