当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
New Students on Sesame Street: What Order-Aware Matrix Embeddings Can Learn from BERT
arXiv - CS - Computation and Language Pub Date : 2021-09-17 , DOI: arxiv-2109.08449
Lukas Galke, Isabelle Cuber, Christoph Meyer, Henrik Ferdinand Nölscher, Angelina Sonderecker, Ansgar Scherp

Large-scale pretrained language models (PreLMs) are revolutionizing natural language processing across all benchmarks. However, their sheer size is prohibitive in low-resource or large-scale applications. While common approaches reduce the size of PreLMs via same-architecture distillation or pruning, we explore distilling PreLMs into more efficient order-aware embedding models. Our results on the GLUE benchmark show that embedding-centric students, which have learned from BERT, yield scores comparable to DistilBERT on QQP and RTE, often match or exceed the scores of ELMo, and only fall behind on detecting linguistic acceptability.

中文翻译:

芝麻街新生:顺序感知矩阵嵌入可以从 BERT 学到什么

大规模预训练语言模型 (PreLM) 正在所有基准测试中彻底改变自然语言处理。然而,它们的庞大规模在低资源或大规模应用中令人望而却步。虽然常见的方法通过相同架构的蒸馏或修剪来减小 PreLMs 的大小,但我们探索将 PreLMs 蒸馏成更有效的顺序感知嵌入模型。我们在 GLUE 基准测试中的结果表明,从 BERT 学习的以嵌入为中心的学生在 QQP 和 RTE 上的得分与 DistilBERT 相当,通常与 ELMo 的得分相匹配或超过,并且仅在检测语言可接受性方面落后。
更新日期:2021-09-20
down
wechat
bug