HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections
arXiv - CS - Information Retrieval Pub Date : 2020-07-12 , DOI: arxiv-2007.05891
Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

Achieving state-of-the-art performance on natural language understanding tasks typically relies on fine-tuning a fresh model for every task. Consequently, this approach leads to a higher overall parameter cost, along with higher technical maintenance for serving multiple models. Learning a single multi-task model that is able to do well for all the tasks has been a challenging and yet attractive proposition. In this paper, we propose \textsc{HyperGrid}, a new approach for highly effective multi-task learning. The proposed approach is based on a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks. In order to construct the proposed hypernetwork, our method learns the interactions and composition between a global (task-agnostic) state and a local task-specific state. We apply our proposed \textsc{HyperGrid} on the current state-of-the-art T5 model, demonstrating strong performance across the GLUE and SuperGLUE benchmarks when using only a single multi-task model. Our method helps bridge the gap between fine-tuning and multi-task learning approaches.

中文翻译：

HyperGrid：具有网格可分解超投影的高效多任务转换器

在自然语言理解任务上实现最先进的性能通常依赖于为每个任务微调新模型。因此，这种方法导致更高的整体参数成本，以及为多个模型提供更高的技术维护。学习一个能够很好地完成所有任务的单一多任务模型一直是一个具有挑战性但又很有吸引力的命题。在本文中，我们提出了 \textsc{HyperGrid}，这是一种用于高效多任务学习的新方法。所提出的方法基于一个可分解的超网络，该网络学习网格投影，有助于将权重矩阵中的区域专门用于不同的任务。为了构建所提出的超网络，我们的方法学习了全局（与任务无关）状态和局部任务特定状态之间的交互和组合。我们将我们提出的 \textsc{HyperGrid} 应用于当前最先进的 T5 模型，当仅使用单个多任务模型时，在 GLUE 和 SuperGLUE 基准测试中展示了强大的性能。我们的方法有助于弥合微调和多任务学习方法之间的差距。

更新日期：2020-07-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文