Do Transformer Modifications Transfer Across Implementations and Applications?,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Do Transformer Modifications Transfer Across Implementations and Applications?
arXiv - CS - Computation and Language Pub Date : 2021-02-23 , DOI: arxiv-2102.11972
Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, Colin Raffel

The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption. In this paper, we comprehensively evaluate many of these modifications in a shared experimental setting that covers most of the common uses of the Transformer in natural language processing. Surprisingly, we find that most modifications do not meaningfully improve performance. Furthermore, most of the Transformer variants we found beneficial were either developed in the same codebase that we used or are relatively minor changes. We conjecture that performance improvements may strongly depend on implementation details and correspondingly make some recommendations for improving the generality of experimental results.

中文翻译：

变压器修改会在整个实现和应用程序之间转移吗？

自从三年前推出以来，研究界就对Transformer架构提出了许多修改建议，但其中很少有人被广泛采用。在本文中，我们在一个共享的实验环境中全面评估了其中的许多修改，这些实验涵盖了自然语言处理中Transformer的大多数常见用法。令人惊讶的是，我们发现大多数修改并不能有效地提高性能。此外，我们发现有益的大多数Transformer变体都是在我们使用的相同代码库中开发的，或者是相对较小的更改。我们推测性能的提高可能在很大程度上取决于实施细节，并相应地提出了一些建议，以提高实验结果的普遍性。

更新日期：2021-02-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文