AUGNLG: Few-shot Natural Language Generation using Self-trained Data Augmentation,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

AUGNLG: Few-shot Natural Language Generation using Self-trained Data Augmentation
arXiv - CS - Computation and Language Pub Date : 2021-06-10 , DOI: arxiv-2106.05589
Xinnuo Xu, Guoyin Wang, Young-Bum Kim, Sungjin Lee

Natural Language Generation (NLG) is a key component in a task-oriented dialogue system, which converts the structured meaning representation (MR) to the natural language. For large-scale conversational systems, where it is common to have over hundreds of intents and thousands of slots, neither template-based approaches nor model-based approaches are scalable. Recently, neural NLGs started leveraging transfer learning and showed promising results in few-shot settings. This paper proposes AUGNLG, a novel data augmentation approach that combines a self-trained neural retrieval model with a few-shot learned NLU model, to automatically create MR-to-Text data from open-domain texts. The proposed system mostly outperforms the state-of-the-art methods on the FewShotWOZ data in both BLEU and Slot Error Rate. We further confirm improved results on the FewShotSGD data and provide comprehensive analysis results on key components of our system. Our code and data are available at https://github.com/XinnuoXu/AugNLG.

中文翻译：

AUGNLG：使用自训练数据增强的小样本自然语言生成

自然语言生成 (NLG) 是面向任务的对话系统中的关键组件，它将结构化意义表示 (MR) 转换为自然语言。对于具有数百个意图和数千个槽位的大型对话系统，基于模板的方法和基于模型的方法都不是可扩展的。最近，神经 NLG 开始利用迁移学习，并在少数镜头设置中显示出有希望的结果。本文提出了 AUGNLG，这是一种新的数据增强方法，它结合了自我训练的神经检索模型和少量学习的 NLU 模型，以从开放域文本中自动创建 MR-to-Text 数据。所提出的系统在 BLEU 和时隙错误率方面的性能大多超过了对 FeesShotWOZ 数据的最先进的方法。我们进一步确认了FewShotSGD 数据的改进结果，并提供了我们系统关键组件的综合分析结果。我们的代码和数据可在 https://github.com/XinnuoXu/AugNLG 获得。

更新日期：2021-06-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>