Neural candidate-aware language models for speech recognition,Computer Speech & Language

当前位置： X-MOL 学术 › Comput. Speech Lang › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Neural candidate-aware language models for speech recognition
Computer Speech & Language ( IF 3.1 ) Pub Date : 2020-09-24 , DOI: 10.1016/j.csl.2020.101157
Tomohiro Tanaka , Ryo Masumura , Takanobu Oba

This paper presents novel neural network based language models that can correct automatic speech recognition (ASR) errors by using speech recognizer outputs as a context. Our proposed models, called neural candidate-aware language models (NCALMs), estimate the generative probability of a target sentence while considering ASR outputs including hypotheses and their posterior probabilities. Recently, neural network language models have achieved great success in ASR field because of their ability to learn long-range contexts and model the word representation in continuous space. However, they estimate a sentence probability without considering other candidates and their posterior probabilities, even though the competing hypotheses are available and include important information to increase the speech recognition accuracy. To overcome this limitation, our idea is to utilize ASR outputs in both the training phase and the inference phase. Our proposed models are conditional generative models consisting of a Transformer encoder and a Transformer decoder. The encoder embeds the candidates as context vectors and the decoder estimates a sentence probability given the context vectors. We evaluate the proposed models in Japanese lecture transcription and English conversational speech recognition tasks. Experimental results show that a NCALM has better ASR performance than a system including a deep neural network-hidden Markov model hybrid system. We further improve ASR performance by using a NCALM and a Transformer language model simultaneously.

中文翻译：

用于语音识别的神经候选人感知语言模型

本文提出了一种基于神经网络的新型语言模型，该模型可以通过使用语音识别器输出作为上下文来纠正自动语音识别（ASR）错误。我们提出的模型称为神经候选者感知语言模型（NCALM），它在考虑ASR输出（包括假设及其后验概率）的同时，估计目标句子的生成概率。最近，神经网络语言模型由于能够学习远程上下文并在连续空间中对单词表示进行建模，因此在ASR领域取得了巨大成功。但是，即使存在相互竞争的假设并且他们包括重要信息以提高语音识别的准确性，他们也可以在不考虑其他候选项及其后验概率的情况下估计句子的概率。为了克服此限制，我们的想法是在训练阶段和推理阶段都利用ASR输出。我们提出的模型是条件生成模型，由变压器编码器和变压器解码器组成。编码器将候选嵌入为上下文向量，而解码器在给定上下文向量的情况下估计句子概率。我们评估日语演讲转录和英语会话语音识别任务中提出的模型。实验结果表明，NCALM的ASR性能要优于包含深度神经网络和隐马尔可夫模型混合系统的系统。通过同时使用NCALM和Transformer语言模型，我们进一步提高了ASR性能。我们提出的模型是由Transformer编码器和Transformer解码器组成的条件生成模型。编码器将候选嵌入为上下文向量，而解码器在给定上下文向量的情况下估计句子概率。我们评估日语演讲转录和英语会话语音识别任务中提出的模型。实验结果表明，NCALM的ASR性能要优于包含深度神经网络和隐马尔可夫模型混合系统的系统。通过同时使用NCALM和Transformer语言模型，我们进一步提高了ASR性能。我们提出的模型是由Transformer编码器和Transformer解码器组成的条件生成模型。编码器将候选嵌入为上下文向量，而解码器在给定上下文向量的情况下估计句子概率。我们评估日语演讲转录和英语会话语音识别任务中提出的模型。实验结果表明，NCALM的ASR性能要优于包含深度神经网络和隐马尔可夫模型混合系统的系统。通过同时使用NCALM和Transformer语言模型，我们进一步提高了ASR性能。我们评估日语演讲转录和英语会话语音识别任务中提出的模型。实验结果表明，NCALM的ASR性能要优于包含深度神经网络和隐马尔可夫模型混合系统的系统。通过同时使用NCALM和Transformer语言模型，我们进一步提高了ASR性能。我们评估日语演讲和英语会话语音识别任务中提出的模型。实验结果表明，NCALM的ASR性能要优于包含深度神经网络和隐马尔可夫模型混合系统的系统。通过同时使用NCALM和Transformer语言模型，我们进一步提高了ASR性能。

更新日期：2020-10-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文