Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers
arXiv - CS - Computation and Language Pub Date : 2021-09-17 , DOI: arxiv-2109.08406
Jason Phang, Haokun Liu, Samuel R. Bowman

Despite the success of fine-tuning pretrained language encoders like BERT for downstream natural language understanding (NLU) tasks, it is still poorly understood how neural networks change after fine-tuning. In this work, we use centered kernel alignment (CKA), a method for comparing learned representations, to measure the similarity of representations in task-tuned models across layers. In experiments across twelve NLU tasks, we discover a consistent block diagonal structure in the similarity of representations within fine-tuned RoBERTa and ALBERT models, with strong similarity within clusters of earlier and later layers, but not between them. The similarity of later layer representations implies that later layers only marginally contribute to task performance, and we verify in experiments that the top few layers of fine-tuned Transformers can be discarded without hurting performance, even with no further tuning.

中文翻译：

微调的 Transformer 显示跨层的相似表示的集群

尽管针对下游自然语言理解 (NLU) 任务对 BERT 等预训练语言编码器进行微调取得了成功，但人们仍然对微调后神经网络如何变化知之甚少。在这项工作中，我们使用中心核对齐 (CKA)，一种比较学习表示的方法，来衡量跨层任务调整模型中表示的相似性。在 12 个 NLU 任务的实验中，我们发现微调 RoBERTa 和 ALBERT 模型中表示相似性的一致块对角结构，在较早和较晚层的集群内具有很强的相似性，但在它们之间没有。后面层表示的相似性意味着后面的层对任务性能的贡献很小，

更新日期：2021-09-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>