An EM Approach to Non-autoregressive Conditional Sequence Generation,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An EM Approach to Non-autoregressive Conditional Sequence Generation
arXiv - CS - Machine Learning Pub Date : 2020-06-29 , DOI: arxiv-2006.16378
Zhiqing Sun, Yiming Yang

Autoregressive (AR) models have been the dominating approach to conditional sequence generation, but are suffering from the issue of high inference latency. Non-autoregressive (NAR) models have been recently proposed to reduce the latency by generating all output tokens in parallel but could only achieve inferior accuracy compared to their autoregressive counterparts, primarily due to a difficulty in dealing with the multi-modality in sequence generation. This paper proposes a new approach that jointly optimizes both AR and NAR models in a unified Expectation-Maximization (EM) framework. In the E-step, an AR model learns to approximate the regularized posterior of the NAR model. In the M-step, the NAR model is updated on the new posterior and selects the training examples for the next AR model. This iterative process can effectively guide the system to remove the multi-modality in the output sequences. To our knowledge, this is the first EM approach to NAR sequence generation. We evaluate our method on the task of machine translation. Experimental results on benchmark data sets show that the proposed approach achieves competitive, if not better, performance with existing NAR models and significantly reduces the inference latency.

中文翻译：

非自回归条件序列生成的 EM 方法

自回归 (AR) 模型一直是条件序列生成的主要方法，但存在推理延迟高的问题。最近有人提出非自回归 (NAR) 模型通过并行生成所有输出标记来减少延迟，但与自回归模型相比只能达到较差的准确度，这主要是由于在序列生成中难以处理多模态。本文提出了一种在统一的期望最大化 (EM) 框架中联合优化 AR 和 NAR 模型的新方法。在 E 步骤中，AR 模型学习近似 NAR 模型的正则化后验。在 M 步中，NAR 模型在新的后验上更新，并为下一个 AR 模型选择训练样例。这种迭代过程可以有效地引导系统去除输出序列中的多模态。据我们所知，这是 NAR 序列生成的第一个 EM 方法。我们在机器翻译任务上评估我们的方法。在基准数据集上的实验结果表明，所提出的方法与现有的 NAR 模型相比，即使不是更好，也能实现具有竞争力的性能，并显着降低推理延迟。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文