A Continuous Space Neural Language Model for Bengali Language,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Continuous Space Neural Language Model for Bengali Language
arXiv - CS - Computation and Language Pub Date : 2020-01-11 , DOI: arxiv-2001.05315
Hemayet Ahmed Chowdhury, Md. Azizul Haque Imon, Anisur Rahman, Aisha Khatun, Md. Saiful Islam

Language models are generally employed to estimate the probability distribution of various linguistic units, making them one of the fundamental parts of natural language processing. Applications of language models include a wide spectrum of tasks such as text summarization, translation and classification. For a low resource language like Bengali, the research in this area so far can be considered to be narrow at the very least, with some traditional count based models being proposed. This paper attempts to address the issue and proposes a continuous-space neural language model, or more specifically an ASGD weight dropped LSTM language model, along with techniques to efficiently train it for Bengali Language. The performance analysis with some currently existing count based models illustrated in this paper also shows that the proposed architecture outperforms its counterparts by achieving an inference perplexity as low as 51.2 on the held out data set for Bengali.

中文翻译：

孟加拉语的连续空间神经语言模型

语言模型通常用于估计各种语言单元的概率分布，使其成为自然语言处理的基本部分之一。语言模型的应用包括广泛的任务，例如文本摘要、翻译和分类。对于像孟加拉语这样的低资源语言，目前这方面的研究至少可以认为是狭窄的，并且提出了一些传统的基于计数的模型。本文试图解决这个问题，并提出了一种连续空间神经语言模型，或者更具体地说是一种 ASGD 权重下降 LSTM 语言模型，以及针对孟加拉语对其进行有效训练的技术。

更新日期：2020-01-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文