当前位置: X-MOL 学术Int. J. Inf. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hindi speech recognition in noisy environment using hybrid technique
International Journal of Information Technology Pub Date : 2021-01-01 , DOI: 10.1007/s41870-020-00586-7
Ashok Kumar , Vikas Mittal

Automatic speech recognition is generally analyzed for two types of word utterances; isolated and continuous-words speech. Continuous-words speech is almost natural way of speaking but is difficult to be recognized through machines (speech recognizers). It is also highly sensitive to environmental variations. There are various parameters which are directly affecting the performance of automatic speech recognition like size of datasets/corpus, type of data sets (isolated, spontaneous or continuous) and environment variations (noisy/clean). The performance of speech recognizers is generally good in clean environments for isolated words, but it becomes typical in noisy environments especially for continuous words/sentences and is still a challenge. In this paper, a hybrid feature extraction technique is proposed by joining core blocks of PLP (perceptual linear predictive) and Mel frequency cepstral coefficients (MFCC) that can be utilized to improve the performance of speech recognizers under such circumstances. Voice activity and detection (VAD)-based frame dropping formula has been used solely within the training part of ASR (automatic speech recognition) procedure obviating its need in actual implementations. The motivation to use this formula is for removal of pauses and distorted elements of speech improving the phonemes modeling further. The proposed method shows average improvement in performance by 12.88% for standard datasets.



中文翻译:

混合技术在嘈杂环境中的印地语语音识别

通常针对两种类型的话语分析自动语音识别:孤立和连续的单词语音。连续单词的语音几乎是自然的语音表达方式,但很难通过机器(语音识别器)识别。它还对环境变化高度敏感。有各种参数直接影响自动语音识别的性能,例如数据集/语料库的大小,数据集的类型(隔离的,自发的或连续的)和环境变化(嘈杂/干净的)。语音识别器的性能通常在干净的环境中对于孤立的单词而言是良好的,但是在嘈杂的环境中(尤其是对于连续的单词/句子)却变得很典型,并且仍然是一个挑战。在本文中,通过将PLP(感知线性预测)和梅尔频率倒谱系数(MFCC)的核心模块结合起来,提出了一种混合特征提取技术,在这种情况下,可以利用这些特征来改善语音识别器的性能。基于语音活动和检测(VAD)的丢帧公式仅在ASR(自动语音识别)过程的训练部分内使用,从而避免了其在实际实现中的需要。使用此公式的动机是消除语音中的停顿和失真元素,从而进一步改善音素建模。对于标准数据集,该方法的性能平均提高了12.88%。基于语音活动和检测(VAD)的丢帧公式仅在ASR(自动语音识别)过程的训练部分内使用,从而避免了其在实际实现中的需要。使用此公式的动机是消除语音中的停顿和失真元素,从而进一步改善音素建模。对于标准数据集,该方法的性能平均提高了12.88%。基于语音活动和检测(VAD)的丢帧公式仅在ASR(自动语音识别)过程的训练部分内使用,从而避免了其在实际实现中的需要。使用此公式的动机是消除语音中的停顿和失真元素,从而进一步改善音素建模。对于标准数据集,该方法的性能平均提高了12.88%。

更新日期:2021-01-01
down
wechat
bug