当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Finnish Language Modeling with Deep Transformer Models
arXiv - CS - Computation and Language Pub Date : 2020-03-14 , DOI: arxiv-2003.11562
Abhilash Jain, Aku Ruohe, Stig-Arne Gr\"onroos, Mikko Kurimo

Transformers have recently taken the center stage in language modeling after LSTM's were considered the dominant model architecture for a long time. In this project, we investigate the performance of the Transformer architectures-BERT and Transformer-XL for the language modeling task. We use a sub-word model setting with the Finnish language and compare it to the previous State of the art (SOTA) LSTM model. BERT achieves a pseudo-perplexity score of 14.5, which is the first such measure achieved as far as we know. Transformer-XL improves upon the perplexity score to 73.58 which is 27\% better than the LSTM model.

中文翻译:

使用深度 Transformer 模型进行芬兰语建模

在 LSTM 长期以来被认为是主要的模型架构之后,Transformer 最近在语言建模中占据了中心位置。在这个项目中,我们研究了 Transformer 架构-BERT 和 Transformer-XL 在语言建模任务中的性能。我们使用芬兰语的子词模型设置,并将其与之前的最新技术 (SOTA) LSTM 模型进行比较。BERT 达到了 14.5 的伪困惑分数,这是我们所知的第一个这样的衡量标准。Transformer-XL 将困惑分数提高到 73.58,比 LSTM 模型好 27%。
更新日期:2020-03-30
down
wechat
bug