当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Subspace Gaussian mixture based language modeling for large vocabulary continuous speech recognition
Speech Communication ( IF 3.2 ) Pub Date : 2020-01-23 , DOI: 10.1016/j.specom.2020.01.001
Ri Hyon Sun , Ri Jong Chol

This paper focuses on adaptable continuous space language modeling approach of combining longer context information of recurrent neural network (RNN) with adaptation ability of subspace Gaussian mixture model (SGMM) which has been widely used in acoustic modeling for automatic speech recognition (ASR).

In large vocabulary continuous speech recognition (LVCSR) it is a challenging problem to construct language models that can capture the longer context information of words and ensure generalization and adaptation ability. Recently, language modeling based on RNN and its variants have been broadly studied in this field.

The goal of our approach is to obtain the history feature vectors of a word with longer context information and model every word by subspace Gaussian mixture model such as Tandem system used in acoustic modeling for ASR. Also, it is to apply fMLLR adaptation method, which is widely used in SGMM based acoustic modeling, for adaptation of subspace Gaussian mixture based language model (SGMLM).

After fMLLR adaptation, SGMLMs based on Top-Down and Bottom-Up obtain WERs of 5.70 % and 6.01%, which are better than 4.15% and 4.61% of that without adaptation, respectively. Also, with fMLLR adaptation, Top-Down and Bottom-Up based SGMLMs yield absolute word error rate reduction of 1.48%, 1.02% and a relative perplexity reduction of 10.02%, 6.46% compared to RNNLM without adaptation, respectively.



中文翻译:

基于子空间高斯混合的语言模型用于大词汇量连续语音识别

本文致力于将连续神经网络(RNN)的较长上下文信息与子空间高斯混合模型(SGMM)的自适应能力相结合的自适应连续空间语言建模方法,该方法已被广泛用于自动语音识别(ASR)的声学建模中。

在大词汇量连续语音识别(LVCSR)中,构建语言模型以捕获较长的单词上下文信息并确保泛化和自适应能力是一个具有挑战性的问题。最近,在该领域中已经广泛研究了基于RNN及其变体的语言建模。

我们的方法的目标是获得具有较长上下文信息的单词的历史特征向量,并通过子空间高斯混合模型(例如用于ASR声学建模的串联系统)对每个单词进行建模。同样,将应用在基于SGMM的声学建模中广泛使用的fMLLR自适应方法,以适应基于子空间高斯混合的语言模型(SGMLM)。

经过fMLLR适应后,基于自上而下和自下而上的SGMLM获得的WER分别为5.70%和6.01%,分别比未经适应的WER的4.15%和4.61%好。此外,与没有自适应的RNNLM相比,通过fMLLR自适应,基于自上而下和自下而上的SGMLM的绝对单词错误率降低了1.48%,1.02%,相对困惑度降低了10.02%,6.46%。

更新日期:2020-01-23
down
wechat
bug