HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections
arXiv - CS - Artificial Intelligence Pub Date : 2020-07-12 , DOI: arxiv-2007.05891
Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

Achieving state-of-the-art performance on natural language understanding tasks typically relies on fine-tuning a fresh model for every task. Consequently, this approach leads to a higher overall parameter cost, along with higher technical maintenance for serving multiple models. Learning a single multi-task model that is able to do well for all the tasks has been a challenging and yet attractive proposition. In this paper, we propose \textsc{HyperGrid}, a new approach for highly effective multi-task learning. The proposed approach is based on a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks. In order to construct the proposed hypernetwork, our method learns the interactions and composition between a global (task-agnostic) state and a local task-specific state. We apply our proposed \textsc{HyperGrid} on the current state-of-the-art T5 model, demonstrating strong performance across the GLUE and SuperGLUE benchmarks when using only a single multi-task model. Our method helps bridge the gap between fine-tuning and multi-task learning approaches.

中文翻译：

HyperGrid：具有网格方式可分解超投影的高效多任务变压器

在自然语言理解任务上实现最先进的性能通常取决于为每个任务微调新模型。因此，这种方法导致更高的总体参数成本，以及用于服务多个模型的更高的技术维护。学习一个能够很好地完成所有任务的单一多任务模型是一项具有挑战性但又有吸引力的提议。在本文中，我们提出了\ textsc {HyperGrid}，这是一种用于高效多任务学习的新方法。所提出的方法基于可分解的超级网络，该网络学习可帮助专门针对不同任务的权重矩阵区域的网格投影。为了构造建议的超网络，我们的方法学习了全局（与任务无关）状态和局部特定于任务的状态之间的交互作用和组成。我们将建议的\ textsc {HyperGrid}应用在当前最新的T5模型上，当仅使用一个多任务模型时，在GLUE和SuperGLUE基准测试中展示出强大的性能。我们的方法有助于弥合微调和多任务学习方法之间的差距。

更新日期：2020-07-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>