Finnish Language Modeling with Deep Transformer Models,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Finnish Language Modeling with Deep Transformer Models
arXiv - CS - Computation and Language Pub Date : 2020-03-14 , DOI: arxiv-2003.11562
Abhilash Jain, Aku Ruohe, Stig-Arne Gr\"onroos, Mikko Kurimo

Transformers have recently taken the center stage in language modeling after LSTM's were considered the dominant model architecture for a long time. In this project, we investigate the performance of the Transformer architectures-BERT and Transformer-XL for the language modeling task. We use a sub-word model setting with the Finnish language and compare it to the previous State of the art (SOTA) LSTM model. BERT achieves a pseudo-perplexity score of 14.5, which is the first such measure achieved as far as we know. Transformer-XL improves upon the perplexity score to 73.58 which is 27\% better than the LSTM model.

中文翻译：

使用深度 Transformer 模型进行芬兰语建模

在 LSTM 长期以来被认为是主要的模型架构之后，Transformer 最近在语言建模中占据了中心位置。在这个项目中，我们研究了 Transformer 架构-BERT 和 Transformer-XL 在语言建模任务中的性能。我们使用芬兰语的子词模型设置，并将其与之前的最新技术 (SOTA) LSTM 模型进行比较。BERT 达到了 14.5 的伪困惑分数，这是我们所知的第一个这样的衡量标准。Transformer-XL 将困惑分数提高到 73.58，比 LSTM 模型好 27%。

更新日期：2020-03-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>