A Multistream Feature Framework Based on Bandpass Modulation Filtering for Robust Speech Recognition.,IEEE/ACM Transactions on Audio, Speech, and Language Processing

当前位置： X-MOL 学术 › IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Multistream Feature Framework Based on Bandpass Modulation Filtering for Robust Speech Recognition.
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 4.1 ) Pub Date : 2013-02-01 , DOI: 10.1109/tasl.2012.2219526
Sridhar Krishna Nemala ₁ , Kailash Patil ₁ , Mounya Elhilali ₁

Affiliation

There is strong neurophysiological evidence suggesting that processing of speech signals in the brain happens along parallel paths which encode complementary information in the signal. These parallel streams are organized around a duality of slow vs. fast: Coarse signal dynamics appear to be processed separately from rapidly changing modulations both in the spectral and temporal dimensions. We adapt such duality in a multistream framework for robust speaker-independent phoneme recognition. The scheme presented here centers around a multi-path bandpass modulation analysis of speech sounds with each stream covering an entire range of temporal and spectral modulations. By performing bandpass operations along the spectral and temporal dimensions, the proposed scheme avoids the classic feature explosion problem of previous multistream approaches while maintaining the advantage of parallelism and localized feature analysis. The proposed architecture results in substantial improvements over standard and state-of-the-art feature schemes for phoneme recognition, particularly in presence of nonstationary noise, reverberation and channel distortions.

中文翻译：

基于带通调制滤波的多流特征框架，用于鲁棒语音识别。

有强大的神经生理学证据表明，大脑中语音信号的处理是沿着并行路径进行的，这些路径对信号中的补充信息进行编码。这些并行流围绕慢速与快速的对偶进行组织：粗信号动态似乎是在频谱和时间维度上与快速变化的调制分开处理的。我们在多流框架中调整了这种双重性，以实现可靠的独立于说话者的音素识别。这里介绍的方案围绕语音的多路径带通调制分析，每个流覆盖整个时间和频谱调制范围。通过沿频谱和时间维度执行带通操作，所提出的方案避免了先前多流方法的经典特征爆炸问题，同时保持了并行性和局部特征分析的优势。所提出的体系结构导致对音素识别的标准和最新功能方案的实质性改进，特别是在存在非平稳噪声，混响和信道失真的情况下。

更新日期：2019-11-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文