Dynamic out-of-vocabulary word registration to language model for speech recognition,EURASIP Journal on Audio, Speech, and Music Processing

当前位置： X-MOL 学术 › EURASIP J. Audio Speech Music Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Dynamic out-of-vocabulary word registration to language model for speech recognition
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2021-01-25 , DOI: 10.1186/s13636-020-00193-1
Norihide Kitaoka , Bohan Chen , Yuya Obashi

We propose a method of dynamically registering out-of-vocabulary (OOV) words by assigning the pronunciations of these words to pre-inserted OOV tokens, editing the pronunciations of the tokens. To do this, we add OOV tokens to an additional, partial copy of our corpus, either randomly or to part-of-speech (POS) tags in the selected utterances, when training the language model (LM) for speech recognition. This results in an LM containing OOV tokens, to which we can assign pronunciations. We also investigate the impact of acoustic complexity and the “natural” occurrence frequency of OOV words on the recognition of registered OOV words. The proposed OOV word registration method is evaluated using two modern automatic speech recognition (ASR) systems, Julius and Kaldi, using DNN-HMM acoustic models and N-gram language models (plus an additional evaluation using RNN re-scoring with Kaldi). Our experimental results show that when using the proposed OOV registration method, modern ASR systems can recognize OOV words without re-training the language model, that the acoustic complexity of OOV words affects OOV recognition, and that differences between the “natural” and the assigned occurrence frequencies of OOV words have little impact on the final recognition results.

中文翻译：

用于语音识别的语言模型的动态词汇外词注册

我们提出了一种动态注册词外 (OOV) 词的方法，通过将这些词的发音分配给预先插入的 OOV 标记，编辑标记的发音。为此，在训练用于语音识别的语言模型 (LM) 时，我们将 OOV 标记添加到我们的语料库的附加部分副本中，随机或添加到所选话语中的词性 (POS) 标签。这导致包含 OOV 标记的 LM，我们可以为其分配发音。我们还研究了声学复杂性和 OOV 词的“自然”出现频率对注册 OOV 词识别的影响。使用两个现代自动语音识别 (ASR) 系统 Julius 和 Kaldi 评估所提出的 OOV 词注册方法，使用 DNN-HMM 声学模型和 N-gram 语言模型（加上使用 Kaldi 的 RNN 重新评分的额外评估）。我们的实验结果表明，当使用所提出的 OOV 配准方法时，现代 ASR 系统可以在不重新训练语言模型的情况下识别 OOV 词，OOV 词的声学复杂性影响 OOV 识别，并且“自然”和分配的词之间的差异OOV词的出现频率对最终识别结果影响不大。

更新日期：2021-01-25

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文