Noise robust in-domain children speech enhancement for automatic Punjabi recognition system under mismatched conditions,Applied Acoustics

当前位置： X-MOL 学术 › Appl. Acoust. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Noise robust in-domain children speech enhancement for automatic Punjabi recognition system under mismatched conditions
Applied Acoustics ( IF 3.4 ) Pub Date : 2020-12-24 , DOI: 10.1016/j.apacoust.2020.107810
Puneet Bawa , Virender Kadyan

Success of any commercial Automatic Speech Recognition (ASR) system depends upon availability of its training data. Although, it's performance gets degraded due to absence of enough signal processing characteristics in less resource language corpora. Development of Punjabi Children speech system is one such challenge where zero resource conditions and variabilities in children speech occurs due to speaking speed and vocal tract length than that of adult speech. In this paper, efforts have been made to build Punjabi Children ASR system under mismatched conditions using noise robust approaches like Mel Frequency Cepstral Coefficient (MFCC) or Gammatone Frequency Cepstral Coefficient (GFCC). Consequently, acoustic and phonetic variations among adult and children speech are handled using gender based in-domain training data augmentation and later acoustic variability among speakers in training and testing sets are normalised using Vocal Tract Length Normalization (VTLN). We demonstrate that inclusion of pitch features with test normalized children dataset has significantly enhanced system performance over different environment conditions i.e clean or noisy. The experimental results show a relative improvement of 30.94% using adult female voice pooled with limited children speech over adult male corpus on noise based training data augmentation respectively.

中文翻译：

不匹配条件下用于自动旁遮普识别系统的抗噪能力强的域内儿童语音增强

任何商用自动语音识别（ASR）系统的成功取决于其训练数据的可用性。虽然，由于资源语言库中缺少足够的信号处理特性，其性能会降低。旁遮普儿童语音系统的发展就是这样的挑战之一，其中儿童语音的零资源条件和变异性是由于语音速度和声道长度比成人语音所致。在本文中，已努力使用不可靠的方法（如梅尔频率倒谱系数（MFCC）或伽马通频率倒谱系数（GFCC））在不匹配条件下构建旁遮普儿童ASR系统。所以，使用基于性别的域内训练数据扩充来处理成人和儿童语音之间的声音和语音变化，然后使用人声道长度归一化（VTLN）归一化训练和测试集中说话者之间的声音变化。我们证明将音高特征与测试归一化子数据集包括在一起可以显着增强在不同环境条件（即干净或嘈杂）下的系统性能。实验结果表明，在基于噪声的训练数据扩充上，使用成年女性语音合并有限儿童语音的能力比成年男性语料库的相对提高了30.94％。我们证明将音高特征与测试归一化子数据集包括在一起可以显着增强在不同环境条件（即干净或嘈杂）下的系统性能。实验结果表明，在基于噪声的训练数据扩充上，使用成年女性语音合并有限儿童语音的能力比成年男性语料库的相对提高了30.94％。我们证明将音高特征与测试归一化子数据集包括在一起可以显着增强在不同环境条件（即干净或嘈杂）下的系统性能。实验结果表明，在基于噪声的训练数据扩充上，使用成年女性语音合并有限儿童语音的能力比成年男性语料库的相对提高了30.94％。

更新日期：2020-12-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>