当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multilingual and multimode phone recognition system for Indian languages
Speech Communication ( IF 2.4 ) Pub Date : 2020-02-26 , DOI: 10.1016/j.specom.2020.02.006
Kumud Tripathi , M. Kiran Reddy , K. Sreenivasa Rao

The aim of this paper is to develop a flexible framework capable of automatically recognizing phonetic units present in a speech utterance of any language spoken in any mode. In this study, we considered two modes of speech: conversation and read modes in four Indian languages, namely, Telugu, Kannada, Odia, and Bengali. The proposed approach consists of two stages: (i) Automatic speech mode classification (SMC) and (ii) Automatic phoneme recognition using mode-specific multilingual phone recognition system (MMPRS). The vocal tract and excitation source features are considered for classifying speech modes using feed forward neural networks (FFNNs). The vocal tract, excitation source, and tandem features are used in training deep neural network (DNN)-based multilingual phone recognition systems (MPRSs). The performance of the proposed approach is compared with baseline mode-dependent and mode-independent MPRSs. Experimental results show that the proposed approach which combines both SMC and MMPRS into a single system outperforms the baseline phone recognition systems.



中文翻译:

印度语言的多语言和多模式电话识别系统

本文的目的是开发一种灵活的框架,该框架能够自动识别以任何方式说出的任何语言的语音发音中出现的语音单位。在这项研究中,我们考虑了两种语音模式:四种印度语言的对话和阅读模式,分别是泰卢固语,卡纳达语,奥迪亚语和孟加拉语。所提出的方法包括两个阶段:(i)自动语音模式分类(SMC)和(ii)使用特定于模式的多语言电话识别系统(MMPRS)进行自动音素识别。为了使用前馈神经网络(FFNN)对语音模式进行分类,考虑了声道和激励源特征。声道,激励源和串联功能用于训练基于深度神经网络(DNN)的多语言电话识别系统(MPRS)。所提出的方法的性能与基线模式相关和模式无关的MPRS进行了比较。实验结果表明,所提出的将SMC和MMPRS结合到一个系统中的方法优于基线电话识别系统。

更新日期:2020-02-26
down
wechat
bug