当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dialogue Distillation: Open-Domain Dialogue Augmentation Using Unpaired Data
arXiv - CS - Computation and Language Pub Date : 2020-09-20 , DOI: arxiv-2009.09427
Rongsheng Zhang, Yinhe Zheng, Jianzhi Shao, Xiaoxi Mao, Yadong Xi, Minlie Huang

Recent advances in open-domain dialogue systems rely on the success of neural models that are trained on large-scale data. However, collecting large-scale dialogue data is usually time-consuming and labor-intensive. To address this data dilemma, we propose a novel data augmentation method for training open-domain dialogue models by utilizing unpaired data. Specifically, a data-level distillation process is first proposed to construct augmented dialogues where both post and response are retrieved from the unpaired data. A ranking module is employed to filter out low-quality dialogues. Further, a model-level distillation process is employed to distill a teacher model trained on high-quality paired data to augmented dialogue pairs, thereby preventing dialogue models from being affected by the noise in the augmented data. Automatic and manual evaluation indicates that our method can produce high-quality dialogue pairs with diverse contents, and the proposed data-level and model-level dialogue distillation can improve the performance of competitive baselines.

中文翻译:

对话蒸馏:使用未配对数据的开放域对话增强

开放域对话系统的最新进展依赖于在大规模数据上训练的神经模型的成功。然而,收集大规模对话数据通常耗时耗力。为了解决这个数据困境,我们提出了一种新的数据增强方法,用于利用未配对数据训练开放域对话模型。具体来说,首先提出了数据级蒸馏过程来构建增强对话,其中从未配对的数据中检索帖子和响应。排名模块被用来过滤掉低质量的对话。此外,采用模型级蒸馏过程将在高质量配对数据上训练的教师模型蒸馏为增强对话对,从而防止对话模型受到增强数据中的噪声的影响。
更新日期:2020-11-11
down
wechat
bug