当前位置:
X-MOL 学术
›
arXiv.cs.CL
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
What do we expect from Multiple-choice QA Systems?
arXiv - CS - Computation and Language Pub Date : 2020-11-20 , DOI: arxiv-2011.10647 Krunal Shah, Nitish Gupta, Dan Roth
arXiv - CS - Computation and Language Pub Date : 2020-11-20 , DOI: arxiv-2011.10647 Krunal Shah, Nitish Gupta, Dan Roth
The recent success of machine learning systems on various QA datasets could
be interpreted as a significant improvement in models' language understanding
abilities. However, using various perturbations, multiple recent works have
shown that good performance on a dataset might not indicate performance that
correlates well with human's expectations from models that "understand"
language. In this work we consider a top performing model on several Multiple
Choice Question Answering (MCQA) datasets, and evaluate it against a set of
expectations one might have from such a model, using a series of
zero-information perturbations of the model's inputs. Our results show that the
model clearly falls short of our expectations, and motivates a modified
training approach that forces the model to better attend to the inputs. We show
that the new training paradigm leads to a model that performs on par with the
original model while better satisfying our expectations.
中文翻译:
我们对多项选择质量检查系统有什么期望?
机器学习系统最近在各种QA数据集上的成功可以解释为模型语言理解能力的显着提高。但是,使用各种摄动,最近的多项工作表明,数据集上的良好性能可能并不表明性能与人们对“理解”语言的模型的期望值良好相关。在这项工作中,我们考虑了多个选择题回答(MCQA)数据集上表现最好的模型,并使用模型输入的一系列零信息扰动,根据该模型可能产生的一组期望对它进行评估。我们的结果表明,该模型显然没有达到我们的期望,并激发了一种改进的训练方法,该模型迫使模型更好地参与了输入。
更新日期:2020-11-25
中文翻译:
我们对多项选择质量检查系统有什么期望?
机器学习系统最近在各种QA数据集上的成功可以解释为模型语言理解能力的显着提高。但是,使用各种摄动,最近的多项工作表明,数据集上的良好性能可能并不表明性能与人们对“理解”语言的模型的期望值良好相关。在这项工作中,我们考虑了多个选择题回答(MCQA)数据集上表现最好的模型,并使用模型输入的一系列零信息扰动,根据该模型可能产生的一组期望对它进行评估。我们的结果表明,该模型显然没有达到我们的期望,并激发了一种改进的训练方法,该模型迫使模型更好地参与了输入。