Improving Unsupervised Question Answering via Summarization-Informed Question Generation,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improving Unsupervised Question Answering via Summarization-Informed Question Generation
arXiv - CS - Computation and Language Pub Date : 2021-09-16 , DOI: arxiv-2109.07954
Chenyang Lyu, Lifeng Shang, Yvette Graham, Jennifer Foster, Xin Jiang, Qun Liu

Question Generation (QG) is the task of generating a plausible question for a given pair. Template-based QG uses linguistically-informed heuristics to transform declarative sentences into interrogatives, whereas supervised QG uses existing Question Answering (QA) datasets to train a system to generate a question given a passage and an answer. A disadvantage of the heuristic approach is that the generated questions are heavily tied to their declarative counterparts. A disadvantage of the supervised approach is that they are heavily tied to the domain/language of the QA dataset used as training data. In order to overcome these shortcomings, we propose an unsupervised QG method which uses questions generated heuristically from summaries as a source of training data for a QG system. We make use of freely available news summary data, transforming declarative summary sentences into appropriate questions using heuristics informed by dependency parsing, named entity recognition and semantic role labeling. The resulting questions are then combined with the original news articles to train an end-to-end neural QG model. We extrinsically evaluate our approach using unsupervised QA: our QG model is used to generate synthetic QA pairs for training a QA model. Experimental results show that, trained with only 20k English Wikipedia-based synthetic QA pairs, the QA model substantially outperforms previous unsupervised models on three in-domain datasets (SQuAD1.1, Natural Questions, TriviaQA) and three out-of-domain datasets (NewsQA, BioASQ, DuoRC), demonstrating the transferability of the approach.

中文翻译：

通过基于摘要的问题生成改进无监督问答

问题生成 (QG) 是为给定的问题生成合理的问题的任务一对。基于模板的 QG 使用基于语言的启发式方法将陈述句转换为疑问句，而有监督的 QG 使用现有的问答 (QA) 数据集来训练系统生成给定段落和答案的问题。启发式方法的一个缺点是生成的问题与它们的声明性问题密切相关。监督方法的一个缺点是它们与用作训练数据的 QA 数据集的领域/语言密切相关。为了克服这些缺点，我们提出了一种无监督的 QG 方法，该方法使用从摘要中启发式生成的问题作为 QG 系统的训练数据源。我们利用免费提供的新闻摘要数据，使用依赖解析、命名实体识别和语义角色标记提供的启发式方法将陈述性摘要句子转换为适当的问题。然后将产生的问题与原始新闻文章相结合，以训练端到端的神经 QG 模型。我们使用无监督 QA 从外部评估我们的方法：我们的 QG 模型用于生成合成 QA 对以训练 QA 模型。实验结果表明，仅用 20k 个基于英语维基百科的合成 QA 对训练，QA 模型在三个域内数据集（SQuAD1.1、Natural Questions、TriviaQA）和三个域外数据集（ NewsQA、BioASQ、DuoRC），证明了该方法的可转移性。然后将产生的问题与原始新闻文章相结合，以训练端到端的神经 QG 模型。我们使用无监督 QA 从外部评估我们的方法：我们的 QG 模型用于生成合成 QA 对以训练 QA 模型。实验结果表明，仅用 20k 个基于英语维基百科的合成 QA 对训练，QA 模型在三个域内数据集（SQuAD1.1、Natural Questions、TriviaQA）和三个域外数据集（ NewsQA、BioASQ、DuoRC），证明了该方法的可转移性。然后将产生的问题与原始新闻文章相结合，以训练端到端的神经 QG 模型。我们使用无监督 QA 从外部评估我们的方法：我们的 QG 模型用于生成合成 QA 对以训练 QA 模型。实验结果表明，仅用 20k 个基于英语维基百科的合成 QA 对训练，QA 模型在三个域内数据集（SQuAD1.1、Natural Questions、TriviaQA）和三个域外数据集（ NewsQA、BioASQ、DuoRC），证明了该方法的可转移性。

更新日期：2021-09-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文