Learning Source Phrase Representations for Neural Machine Translation,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning Source Phrase Representations for Neural Machine Translation
arXiv - CS - Computation and Language Pub Date : 2020-06-25 , DOI: arxiv-2006.14405
Hongfei Xu and Josef van Genabith and Deyi Xiong and Qiuhui Liu and Jingyi Zhang

The Transformer translation model (Vaswani et al., 2017) based on a multi-head attention mechanism can be computed effectively in parallel and has significantly pushed forward the performance of Neural Machine Translation (NMT). Though intuitively the attentional network can connect distant words via shorter network paths than RNNs, empirical analysis demonstrates that it still has difficulty in fully capturing long-distance dependencies (Tang et al., 2018). Considering that modeling phrases instead of words has significantly improved the Statistical Machine Translation (SMT) approach through the use of larger translation blocks ("phrases") and its reordering ability, modeling NMT at phrase level is an intuitive proposal to help the model capture long-distance relationships. In this paper, we first propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations. In addition, we incorporate the generated phrase representations into the Transformer translation model to enhance its ability to capture long-distance relationships. In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline, which shows the effectiveness of our approach. Our approach helps Transformer Base models perform at the level of Transformer Big models, and even significantly better for long sentences, but with substantially fewer parameters and training steps. The fact that phrase representations help even in the big setting further supports our conjecture that they make a valuable contribution to long-distance relations.

中文翻译：

神经机器翻译的学习源短语表示

基于多头注意力机制的 Transformer 翻译模型 (Vaswani et al., 2017) 可以有效地并行计算，并显着推动了神经机器翻译 (NMT) 的性能。尽管直觉上注意网络可以通过比 RNN 更短的网络路径连接远距离的单词，但经验分析表明，它仍然难以完全捕获长距离依赖关系（Tang et al., 2018）。考虑到建模短语而不是单词通过使用更大的翻译块（“短语”）及其重新排序能力显着改进了统计机器翻译（SMT）方法，在短语级别建模 NMT 是一个直观的建议，以帮助模型捕获长-距离关系。在本文中，我们首先提出了一种细心的短语表示生成机制，该机制能够从相应的标记表示中生成短语表示。此外，我们将生成的短语表示合并到 Transformer 翻译模型中，以增强其捕获长距离关系的能力。在我们的实验中，我们在强大的 Transformer 基线之上在 WMT 14 英德和英法任务上获得了显着改进，这表明了我们方法的有效性。我们的方法帮助 Transformer Base 模型在 Transformer Big 模型的级别上执行，对于长句甚至明显更好，但参数和训练步骤要少得多。

更新日期：2020-06-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>