Multi-Stream Transformers,arXiv - CS - Neural and Evolutionary Computing

当前位置： X-MOL 学术 › arXiv.cs.NE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-Stream Transformers
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2021-07-21 , DOI: arxiv-2107.10342
Mikhail Burtsev, Anna Rumshisky

Transformer-based encoder-decoder models produce a fused token-wise representation after every encoder layer. We investigate the effects of allowing the encoder to preserve and explore alternative hypotheses, combined at the end of the encoding process. To that end, we design and examine a $\textit{Multi-stream Transformer}$ architecture and find that splitting the Transformer encoder into multiple encoder streams and allowing the model to merge multiple representational hypotheses improves performance, with further improvement obtained by adding a skip connection between the first and the final encoder layer.

中文翻译：

多流转换器

基于 Transformer 的编码器-解码器模型在每个编码器层之后产生一个融合的 token-wise 表示。我们研究了允许编码器在编码过程结束时保留和探索替代假设的影响。为此，我们设计并检查了一个 $\textit{Multi-stream Transformer}$ 架构，发现将 Transformer 编码器拆分为多个编码器流并允许模型合并多个表征假设可以提高性能，通过添加一个跳过第一个和最后一个编码器层之间的连接。

更新日期：2021-07-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文