当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Non-Autoregressive Machine Translation with Disentangled Context Transformer
arXiv - CS - Computation and Language Pub Date : 2020-01-15 , DOI: arxiv-2001.05136
Jungo Kasai, James Cross, Marjan Ghazvininejad, Jiatao Gu

State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens. The sequential nature of this generation process causes fundamental latency in inference since we cannot generate multiple tokens in each sentence in parallel. We propose an attention-masking based model, called Disentangled Context (DisCo) transformer, that simultaneously generates all tokens given different contexts. The DisCo transformer is trained to predict every output token given an arbitrary subset of the other reference tokens. We also develop the parallel easy-first inference algorithm, which iteratively refines every token in parallel and reduces the number of required iterations. Our extensive experiments on 7 translation directions with varying data sizes demonstrate that our model achieves competitive, if not better, performance compared to the state of the art in non-autoregressive machine translation while significantly reducing decoding time on average. Our code is available at https://github.com/facebookresearch/DisCo.

中文翻译:

使用 Disentangled Context Transformer 的非自回归机器翻译

最先进的神经机器翻译模型从左到右生成翻译,每一步都以先前生成的标记为条件。由于我们无法在每个句子中并行生成多个标记,因此此生成过程的顺序性质会导致推理中的基本延迟。我们提出了一种基于注意力屏蔽的模型,称为 Disentangled Context (DisCo) 转换器,它同时生成给定不同上下文的所有标记。DisCo 转换器经过训练,可以在给定其他参考标记的任意子集的情况下预测每个输出标记。我们还开发了并行易优先推理算法,该算法迭代地并行优化每个标记并减少所需的迭代次数。我们在具有不同数据大小的 7 个翻译方向上的广泛实验表明,与非自回归机器翻译的最新技术相比,我们的模型实现了具有竞争力的性能,如果不是更好的话,同时显着减少平均解码时间。我们的代码可在 https://github.com/facebookresearch/DisCo 上找到。
更新日期:2020-07-01
down
wechat
bug