当前位置: X-MOL 学术IEEE Signal Process. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Variational Denoising Autoencoders and Least-Squares Policy Iteration for Statistical Dialogue Managers
IEEE Signal Processing Letters ( IF 3.9 ) Pub Date : 2020-01-01 , DOI: 10.1109/lsp.2020.2998361
Vassilios Diakoloukas , Fotios Lygerakis , Michail G. Lagoudakis , Margarita Kotti

The use of Reinforcement Learning (RL) approaches for dialogue policy optimization has been the new trend for dialogue management systems. Several methods have been proposed, which are trained on dialogue data to provide optimal system response. However, most of these approaches exhibit performance degradation in the presence of noise, poor scalability to other domains, as well as performance instabilities. To overcome these problems, we propose a novel approach based on the incremental, sample-efficient Least-Squares Policy Iteration (LSPI) algorithm, which is trained on compact, fixed-size dialogue state encodings, obtained from deep Variational Denoising Autoencoders (VDAE). The proposed scheme exhibits stable and noise-robust performance, which significantly outperforms the current state-of-the-art, even in mismatched noise environments.

中文翻译:

统计对话管理器的变分降噪自动编码器和最小二乘策略迭代

使用强化学习(RL)方法进行对话策略优化已成为对话管理系统的新趋势。已经提出了几种方法,对对话数据进行训练以提供最佳系统响应。然而,这些方法中的大多数在存在噪声的情况下表现出性能下降、对其他域的可扩展性差以及性能不稳定。为了克服这些问题,我们提出了一种基于增量、样本高效的最小二乘策略迭代 (LSPI) 算法的新方法,该算法在从深度变分降噪自动编码器 (VDAE) 获得的紧凑、固定大小的对话状态编码上进行训练. 所提出的方案表现出稳定且抗噪的性能,即使在不匹配的噪声环境中也显着优于当前最先进的技术。
更新日期:2020-01-01
down
wechat
bug