A Systematic Study of Inner-Attention-Based Sentence Representations in Multilingual Neural Machine Translation,Computational Linguistics

当前位置： X-MOL 学术 › Comput. Linguist. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Systematic Study of Inner-Attention-Based Sentence Representations in Multilingual Neural Machine Translation
Computational Linguistics ( IF 9.3 ) Pub Date : 2020-06-01 , DOI: 10.1162/coli_a_00377
Raúl Vázquez ₁ , Alessandro Raganato ₁ , Mathias Creutz ₁ , Jörg Tiedemann ₁

Affiliation

Neural machine translation has considerably improved the quality of automatic translations by learning good representations of input sentences. In this paper, we explore a multilingual translation model capable of producing fixed-size sentence representations by incorporating an intermediate cross-lingual shared layer, which we refer to as attention bridge. This layer exploits the semantics from each language and develops into a language-agnostic meaning representation that can be efficiently used for transfer learning. We systematically study the impact of the size of the attention bridge and the effect of including additional languages in the model. In contrast to related previous work, we demonstrate that there is no conflict between translation performance and the use of sentence representations in downstream tasks. In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks. Nevertheless, shorter representations lead to increased compression that is beneficial in non-trainable similarity tasks. Similarly, we show that trainable downstream tasks benefit from multilingual models, whereas additional language signals do not improve performance in non-trainable benchmarks. This is an important insight that helps to properly design models for specific applications. Finally, we also include an indepth analysis of the proposed attention bridge and its ability of encoding linguistic properties. We carefully analyze the information that is captured by individual attention heads and identify interesting patterns that explain the performance of specific settings in linguistic probing tasks.

中文翻译：

多语言神经机器翻译中基于内在注意力的句子表示的系统研究

通过学习输入句子的良好表示，神经机器翻译大大提高了自动翻译的质量。在本文中，我们探索了一种多语言翻译模型，该模型能够通过合并一个中间跨语言共享层（我们称之为注意力桥）来产生固定大小的句子表示。该层利用来自每种语言的语义，并发展成一种与语言无关的意义表示，可以有效地用于迁移学习。我们系统地研究了注意力桥大小的影响以及在模型中包含其他语言的影响。与之前的相关工作相比，我们证明了翻译性能与下游任务中句子表示的使用之间没有冲突。特别是，我们表明，更大的中间层不仅可以提高翻译质量，尤其是对于长句子，还可以提高可训练分类任务的准确性。然而，较短的表示会导致压缩增加，这在不可训练的相似性任务中是有益的。同样，我们表明可训练的下游任务受益于多语言模型，而额外的语言信号不会提高不可训练基准的性能。这是一个重要的见解，有助于为特定应用程序正确设计模型。最后，我们还对所提出的注意力桥及其编码语言特性的能力进行了深入分析。

更新日期：2020-06-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>