Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models,arXiv - CS - Human-Computer Interaction

当前位置： X-MOL 学术 › arXiv.cs.HC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models
arXiv - CS - Human-Computer Interaction Pub Date : 2020-09-03 , DOI: arxiv-2009.07053
Joseph F DeRose, Jiayao Wang, and Matthew Berger

Advances in language modeling have led to the development of deep attention-based models that are performant across a wide variety of natural language processing (NLP) problems. These language models are typified by a pre-training process on large unlabeled text corpora and subsequently fine-tuned for specific tasks. Although considerable work has been devoted to understanding the attention mechanisms of pre-trained models, it is less understood how a model's attention mechanisms change when trained for a target NLP task. In this paper, we propose a visual analytics approach to understanding fine-tuning in attention-based language models. Our visualization, Attention Flows, is designed to support users in querying, tracing, and comparing attention within layers, across layers, and amongst attention heads in Transformer-based language models. To help users gain insight on how a classification decision is made, our design is centered on depicting classification-based attention at the deepest layer and how attention from prior layers flows throughout words in the input. Attention Flows supports the analysis of a single model, as well as the visual comparison between pre-trained and fine-tuned models via their similarities and differences. We use Attention Flows to study attention mechanisms in various sentence understanding tasks and highlight how attention evolves to address the nuances of solving these tasks.

中文翻译：

注意力流：分析和比较语言模型中的注意力机制

语言建模的进步导致了基于深度注意力的模型的发展，这些模型在各种自然语言处理 (NLP) 问题中表现出色。这些语言模型的典型代表是对大型未标记文本语料库的预训练过程，然后针对特定任务进行微调。尽管已经有大量工作致力于理解预训练模型的注意力机制，但很少有人了解模型在针对目标 NLP 任务进行训练时的注意力机制如何变化。在本文中，我们提出了一种可视化分析方法来理解基于注意力的语言模型中的微调。我们的可视化注意力流旨在支持用户在基于 Transformer 的语言模型中查询、跟踪和比较层内、跨层和注意力头之间的注意力。为了帮助用户深入了解分类决策是如何做出的，我们的设计集中在描述最深层的基于分类的注意力以及来自先前层的注意力如何在输入中的单词中流动。Attention Flows 支持对单个模型的分析，以及通过预训练和微调模型的异同进行视觉比较。我们使用注意力流来研究各种句子理解任务中的注意力机制，并强调注意力如何演变以解决解决这些任务的细微差别。Attention Flows 支持对单个模型的分析，以及通过预训练和微调模型的异同进行视觉比较。我们使用注意力流来研究各种句子理解任务中的注意力机制，并强调注意力如何演变以解决解决这些任务的细微差别。Attention Flows 支持对单个模型的分析，以及通过预训练和微调模型的异同进行视觉比较。我们使用注意力流来研究各种句子理解任务中的注意力机制，并强调注意力如何演变以解决解决这些任务的细微差别。

更新日期：2020-09-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文