当前位置: X-MOL 学术Pattern Anal. Applic. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition
Pattern Analysis and Applications ( IF 3.9 ) Pub Date : 2019-04-03 , DOI: 10.1007/s10044-019-00816-0
Anirban Bhowmick , Astik Biswas , Mahesh Chandra

Wavelet-based front-end processing technique has gained popularity for its noise removing capability. In this paper, a robust automatic speech recognition system is proposed by utilizing the advantages of psycho-acoustically motivated wavelet-based front-end compensator. In the front-end compensator block, voiced speech probability-based voice activity detector system is designed to separate voiced and unvoiced frames and to update noise statistics. The wavelet packet decomposition tree is designed according to equal rectangular bandwidth (ERB) scale. Wavelet decomposition based on ERB scale is utilized here as the central frequency of the ERB distribution resembles frequency response of human cochlea. Voiced and unvoiced frames are separately decomposed into 24 sub-bands to estimate average sub-band energy (ASE) of each frame. ASE is then used to calculate threshold value. Lastly, Wiener filtering is employed for reducing the residual noise before final reconstruction stage. The proposed system is evaluated on TIMIT database under various noise conditions. The phoneme recognition accuracy of the proposed system is compared with different baseline and robust features as well as with existing front-end compensation techniques. Additionally, the proposed front-end compensator is evaluated in terms of phoneme classification accuracy. Performance improvement is observed in all above experiments.

中文翻译:

心理声学激励的TIMIT电话识别前端补偿器的性能评估

基于小波的前端处理技术因其噪声消除功能而广受欢迎。本文利用心理听觉激励的基于小波的前端补偿器的优点,提出了一种鲁棒的自动语音识别系统。在前端补偿器模块中,基于有声语音概率的语音活动检测器系统被设计为分离有声和无声帧并更新噪声统计信息。小波包分解树是根据相等的矩形带宽(ERB)比例设计的。由于ERB分布的中心频率类似于人耳蜗的频率响应,因此在这里利用基于ERB尺度的小波分解。有声帧和无声帧分别分解为24个子带,以估计每个帧的平均子带能量(ASE)。然后使用ASE计算阈值。最后,采用维纳滤波来减少最终重建阶段之前的残留噪声。所提出的系统在各种噪声条件下在TIMIT数据库上进行了评估。将所提出系统的音素识别精度与不同的基线和鲁棒功能以及现有的前端补偿技术进行了比较。此外,根据音素分类准确性评估了建议的前端补偿器。在所有上述实验中均观察到性能改善。将所提出系统的音素识别精度与不同的基线和鲁棒功能以及现有的前端补偿技术进行了比较。此外,根据音素分类准确性评估了建议的前端补偿器。在所有上述实验中均观察到性能改善。将所提出系统的音素识别精度与不同的基线和鲁棒功能以及现有的前端补偿技术进行了比较。此外,根据音素分类准确性评估了建议的前端补偿器。在所有上述实验中均观察到性能改善。
更新日期:2019-04-03
down
wechat
bug