当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SUMBT+LaRL: End-to-end Neural Task-oriented Dialog System with Reinforcement Learning
arXiv - CS - Computation and Language Pub Date : 2020-09-22 , DOI: arxiv-2009.10447
Hwaran Lee, Seokhwan Jo, HyungJun Kim, Sangkeun Jung, Tae-Yoon Kim

The recent advent of neural approaches for developing each dialog component in task-oriented dialog systems has remarkably improved, yet optimizing the overall system performance remains a challenge. In this paper, we propose an end-to-end trainable neural dialog system with reinforcement learning, named SUMBT+LaRL. The SUMBT+ estimates user-acts as well as dialog belief states, and the LaRL models latent system action spaces and generates responses given the estimated contexts. We experimentally demonstrate that the training framework in which the SUMBT+ and LaRL are separately pretrained and then the entire system is fine-tuned significantly increases dialog success rates. We propose new success criteria for reinforcement learning to the end-to-end dialog system as well as provide experimental analysis on a different result aspect depending on the success criteria and evaluation methods. Consequently, our model achieved the new state-of-the-art success rate of 85.4% on corpus-based evaluation, and a comparable success rate of 81.40% on simulator-based evaluation provided by the DSTC8 challenge.

中文翻译:

SUMBT+LaRL:具有强化学习的端到端神经任务导向对话系统

最近出现的用于开发面向任务的对话系统中的每个对话组件的神经方法得到了显着改善,但优化整体系统性能仍然是一个挑战。在本文中,我们提出了一种具有强化学习的端到端可训练神经对话系统,名为 SUMBT+LaRL。SUMBT+ 估计用户行为以及对话信念状态,LaRL 对潜在系统动作空间进行建模,并根据估计的上下文生成响应。我们通过实验证明,分别对 SUMBT+ 和 LaRL 进行预训练然后对整个系统进行微调的训练框架显着提高了对话成功率。我们为端到端对话系统的强化学习提出了新的成功标准,并根据成功标准和评估方法提供了不同结果方面的实验分析。因此,我们的模型在基于语料库的评估中达到了 85.4% 的最新成功率,在 DSTC8 挑战提供的基于模拟器的评估中达到了 81.40% 的可比成功率。
更新日期:2020-10-07
down
wechat
bug