On the Weaknesses of Reinforcement Learning for Neural Machine Translation,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On the Weaknesses of Reinforcement Learning for Neural Machine Translation
arXiv - CS - Computation and Language Pub Date : 2019-07-03 , DOI: arxiv-1907.01752
Leshem Choshen, Lior Fox, Zohar Aizenbud, Omri Abend

Reinforcement learning (RL) is frequently used to increase performance in text generation tasks, including machine translation (MT), notably through the use of Minimum Risk Training (MRT) and Generative Adversarial Networks (GAN). However, little is known about what and how these methods learn in the context of MT. We prove that one of the most common RL methods for MT does not optimize the expected reward, as well as show that other methods take an infeasibly long time to converge. In fact, our results suggest that RL practices in MT are likely to improve performance only where the pre-trained parameters are already close to yielding the correct translation. Our findings further suggest that observed gains may be due to effects unrelated to the training signal, but rather from changes in the shape of the distribution curve.

中文翻译：

神经机器翻译强化学习的弱点

强化学习 (RL) 经常用于提高文本生成任务的性能，包括机器翻译 (MT)，特别是通过使用最小风险训练 (MRT) 和生成对抗网络 (GAN)。然而，对于这些方法在 MT 的背景下学习什么以及如何学习，我们知之甚少。我们证明了 MT 最常见的 RL 方法之一并没有优化预期奖励，并且表明其他方法需要很长时间才能收敛。事实上，我们的结果表明，只有在预训练参数已经接近产生正确翻译的情况下，机器翻译中的 RL 实践才有可能提高性能。我们的研究结果进一步表明，观察到的收益可能是由于与训练信号无关的影响，而是来自分布曲线形状的变化。

更新日期：2020-01-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文