Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models
arXiv - CS - Computation and Language Pub Date : 2021-06-10 , DOI: arxiv-2106.05505
Tyler A. Chang, Yifan Xu, Weijian Xu, Zhuowen Tu

In this paper, we detail the relationship between convolutions and self-attention in natural language tasks. We show that relative position embeddings in self-attention layers are equivalent to recently-proposed dynamic lightweight convolutions, and we consider multiple new ways of integrating convolutions into Transformer self-attention. Specifically, we propose composite attention, which unites previous relative position embedding methods under a convolutional framework. We conduct experiments by training BERT with composite attention, finding that convolutions consistently improve performance on multiple downstream tasks, replacing absolute position embeddings. To inform future work, we present results comparing lightweight convolutions, dynamic convolutions, and depthwise-separable convolutions in language model pre-training, considering multiple injection points for convolutions in self-attention layers.

中文翻译：

卷积和自注意力：重新解释预训练语言模型中的相对位置

在本文中，我们详细介绍了自然语言任务中卷积和自注意力之间的关系。我们表明自注意力层中的相对位置嵌入等效于最近提出的动态轻量级卷积，并且我们考虑了多种将卷积集成到 Transformer 自注意力中的新方法。具体来说，我们提出了复合注意力，它在卷积框架下结合了先前的相对位置嵌入方法。我们通过使用复合注意力训练 BERT 进行实验，发现卷积不断提高多个下游任务的性能，取代绝对位置嵌入。为了告知未来的工作，我们展示了语言模型预训练中轻量级卷积、动态卷积和深度可分离卷积的比较结果，

更新日期：2021-06-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文