Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training
arXiv - CS - Machine Learning Pub Date : 2022-12-20 , DOI: arxiv-2212.10503
Kelly Marchisio, Patrick Lewis, Yihong Chen, Mikel Artetxe

Prior work has shown that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen. Despite learning a small subset of parameters, this approach is not compute-efficient, as training the new embeddings requires a full forward and backward pass over the entire model. In this work, we propose mini-model adaptation, a compute-efficient alternative that builds a shallow mini-model from a fraction of a large model's parameters. New language-specific embeddings can then be efficiently trained over the mini-model, and plugged into the aligned large model for rapid cross-lingual transfer. We explore two approaches to learn mini-models: MiniJoint, which jointly pretrains the primary model and the mini-model using a single transformer with a secondary MLM head at a middle layer; and MiniPost, where we start from a regular pretrained model and build a mini-model by extracting and freezing a few layers and learning a small number of parameters on top. Experiments on XNLI, MLQA and PAWS-X show that mini-model adaptation matches the performance of the standard approach using up to 2.4x less compute.

中文翻译：

迷你模型适应：通过对齐的浅训练有效地将预训练模型扩展到新语言

先前的工作表明，可以通过学习一组新的嵌入将预训练的掩码语言模型 (MLM) 扩展到新语言，同时保持变换器主体冻结。尽管学习了一小部分参数，但这种方法的计算效率不高，因为训练新的嵌入需要对整个模型进行完整的正向和反向传递。在这项工作中，我们提出了迷你模型自适应，这是一种计算效率高的替代方案，可以从大型模型参数的一小部分构建浅层迷你模型。然后可以在迷你模型上有效地训练新的特定于语言的嵌入，并将其插入对齐的大型模型中以进行快速跨语言传输。我们探索了两种学习迷你模型的方法：MiniJoint，它在中间层使用带有次级 MLM 头的单个变压器联合预训练初级模型和迷你模型；和 MiniPost，我们从一个常规的预训练模型开始，通过提取和冻结几个层并在顶部学习少量参数来构建一个迷你模型。XNLI、MLQA 和 PAWS-X 上的实验表明，迷你模型适应与标准方法的性能相匹配，使用的计算量减少了 2.4 倍。

更新日期：2022-12-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>