Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation
arXiv - CS - Computation and Language Pub Date : 2021-06-09 , DOI: arxiv-2106.05093
Cunxiao Du, Zhaopeng Tu, Jing Jiang

We propose a new training objective named order-agnostic cross entropy (OaXE) for fully non-autoregressive translation (NAT) models. OaXE improves the standard cross-entropy loss to ameliorate the effect of word reordering, which is a common source of the critical multimodality problem in NAT. Concretely, OaXE removes the penalty for word order errors, and computes the cross entropy loss based on the best possible alignment between model predictions and target tokens. Since the log loss is very sensitive to invalid references, we leverage cross entropy initialization and loss truncation to ensure the model focuses on a good part of the search space. Extensive experiments on major WMT benchmarks show that OaXE substantially improves translation performance, setting new state of the art for fully NAT models. Further analyses show that OaXE alleviates the multimodality problem by reducing token repetitions and increasing prediction confidence. Our code, data, and trained models are available at https://github.com/tencent-ailab/ICML21_OAXE.

中文翻译：

非自回归机器翻译的顺序无关交叉熵

我们为完全非自回归翻译 (NAT) 模型提出了一个新的训练目标，名为与顺序无关的交叉熵 (OaXE)。OaXE 改进了标准交叉熵损失以改善单词重新排序的影响，这是 NAT 中关键多模态问题的常见来源。具体来说，OaXE 消除了词序错误的惩罚，并根据模型预测和目标标记之间的最佳可能对齐来计算交叉熵损失。由于日志损失对无效引用非常敏感，我们利用交叉熵初始化和损失截断来确保模型专注于搜索空间的大部分。对主要 WMT 基准测试的大量实验表明，OaXE 显着提高了翻译性能，为完全 NAT 模型设置了新的技术水平。进一步的分析表明，OaXE 通过减少令牌重复和增加预测置信度来缓解多模态问题。我们的代码、数据和训练模型可在 https://github.com/tencent-ailab/ICML21_OAXE 获得。

更新日期：2021-06-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>