Pronunciation augmentation for Mandarin-English code-switching speech recognition,EURASIP Journal on Audio, Speech, and Music Processing

当前位置： X-MOL 学术 › EURASIP J. Audio Speech Music Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Pronunciation augmentation for Mandarin-English code-switching speech recognition
EURASIP Journal on Audio, Speech, and Music Processing ( IF 2.4 ) Pub Date : 2021-08-30 , DOI: 10.1186/s13636-021-00222-7
Yanhua Long ₁ , Shuang Wei ₁ , Jie Lian ₁ , Yijie Li ₂

Affiliation

Code-switching (CS) refers to the phenomenon of using more than one language in an utterance, and it presents great challenge to automatic speech recognition (ASR) due to the code-switching property in one utterance, the pronunciation variation phenomenon of the embedding language words and the heavy training data sparse problem. This paper focuses on the Mandarin-English CS ASR task. We aim at dealing with the pronunciation variation and alleviating the sparse problem of code-switches by using pronunciation augmentation methods. An English-to-Mandarin mix-language phone mapping approach is first proposed to obtain a language-universal CS lexicon. Based on this lexicon, an acoustic data-driven lexicon learning framework is further proposed to learn new pronunciations to cover the accents, mis-pronunciations, or pronunciation variations of those embedding English words. Experiments are performed on real CS ASR tasks. Effectiveness of the proposed methods are examined on all of the conventional, hybrid, and the recent end-to-end speech recognition systems. Experimental results show that both the learned phone mapping and augmented pronunciations can significantly improve the performance of code-switching speech recognition.

中文翻译：

普通话-英语代码切换语音识别的发音增强

代码切换（CS）是指在一个话语中使用一种以上语言的现象，由于一个话语中的代码切换特性，嵌入的发音变化现象，对自动语音识别（ASR）提出了巨大的挑战。语言词和重训练数据稀疏问题。本文重点研究普通话-英语 CS ASR 任务。我们旨在通过使用发音增强方法来处理发音变化并缓解代码切换的稀疏问题。首次提出了一种英语到普通话混合语言的音素映射方法，以获得语言通用的 CS 词典。基于这个词典，进一步提出了一个声学数据驱动的词典学习框架来学习新的发音，以涵盖口音、错误发音、或那些嵌入英语单词的发音变体。实验是在真实的 CS ASR 任务上进行的。在所有传统的、混合的和最近的端到端语音识别系统上检查了所提出方法的有效性。实验结果表明，学习的音素映射和增强的发音都可以显着提高代码切换语音识别的性能。

更新日期：2021-08-30

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>