当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Alleviating the Inequality of Attention Heads for Neural Machine Translation
arXiv - CS - Computation and Language Pub Date : 2020-09-21 , DOI: arxiv-2009.09672
Zewei Sun, Shujian Huang, Xinyu Dai, Jiajun Chen

Recent studies show that the attention heads in Transformer are not equal. We relate this phenomenon to the imbalance training of multi-head attention and the model dependence on specific heads. To tackle this problem, we propose a simple masking method: HeadMask, in two specific ways. Experiments show that translation improvements are achieved on multiple language pairs. Subsequent empirical analyses also support our assumption and confirm the effectiveness of the method.

中文翻译:

缓解神经机器翻译注意头的不平等

最近的研究表明,Transformer 中的注意力头并不相等。我们将这种现象与多头注意力的不平衡训练和模型对特定头的依赖联系起来。为了解决这个问题,我们提出了一种简单的屏蔽方法:HeadMask,具体有两种方式。实验表明,在多语言对上实现了翻译改进。随后的实证分析也支持我们的假设并证实了该方法的有效性。
更新日期:2020-09-22
down
wechat
bug