Context-aware positional representation for self-attention networks,Neurocomputing

当前位置： X-MOL 学术 › Neurocomputing › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Context-aware positional representation for self-attention networks
Neurocomputing ( IF 5.5 ) Pub Date : 2021-04-21 , DOI: 10.1016/j.neucom.2021.04.055
Kehai Chen , Rui Wang , Masao Utiyama , Eiichiro Sumita

In self-attention networks (SANs), positional embeddings are used to model order dependencies between words in the input sentence and are added with word embeddings to gain an input representation, which enables the SAN-based neural model to perform (multi-head) and to stack (multi-layer) self-attentive functions in parallel to learn the representation of the input sentence. However, this input representation only involves static order dependencies based on discrete position indexes of words, that is, is independent of context information, which may be weak in modeling the input sentence. To address this issue, we proposed a novel positional representation method to model order dependencies based on n-gram context or sentence context in the input sentence, which allows SANs to learn a more effective sentence representation. To validate the effectiveness of the proposed method, it is applied to the neural machine translation model, which adopts a typical SAN-based neural model. Experimental results on two widely used translation tasks, i.e., WMT14 English-to-German and WMT17 Chinese-to-English, showed that the proposed approach can significantly improve the translation performance over the strong Transformer baseline.

中文翻译：

自我注意网络的上下文感知位置表示

在自我注意网络（SAN）中，位置嵌入用于对输入句子中单词之间的顺序依赖性进行建模，并与单词嵌入一起添加以获得输入表示，这使基于SAN的神经模型能够执行（多头）并并行堆叠（多层）自注意功能以学习输入句子的表示形式。但是，此输入表示仅涉及基于单词离散位置索引的静态顺序相关性，也就是说，与上下文信息无关，这在建模输入句子时可能较弱。为了解决这个问题，我们提出了一种新颖的位置表示方法，该方法可以基于输入句子中的n-gram上下文或句子上下文对顺序依赖性进行建模，从而使SAN可以学习更有效的句子表示形式。为了验证该方法的有效性，将其应用于神经机器翻译模型，该模型采用了典型的基于SAN的神经模型。在两个广泛使用的翻译任务（WMT14英译汉和WMT17英译汉）上的实验结果表明，所提出的方法可以在强大的Transformer基准上显着提高翻译性能。

更新日期：2021-05-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11