Speech Segregation Using an Auditory Vocoder With Event-Synchronous Enhancements.,IEEE transactions on audio, speech, and language processing

当前位置： X-MOL 学术 › IEEE Trans Audio Speech Lang Process › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Speech Segregation Using an Auditory Vocoder With Event-Synchronous Enhancements.
IEEE transactions on audio, speech, and language processing Pub Date : 2006-11-01 , DOI: 10.1109/tasl.2006.872611
Toshio Irino ₁ , Roy D Patterson , Hideki Kawahara

Affiliation

We propose a new method to segregate concurrent speech sounds using an auditory version of a channel vocoder. The auditory representation of sound, referred to as an "auditory image," preserves fine temporal information, unlike conventional window-based processing systems. This makes it possible to segregate speech sources with an event synchronous procedure. Fundamental frequency information is used to estimate the sequence of glottal pulse times for a target speaker, and to repress the glottal events of other speakers. The procedure leads to robust extraction of the target speech and effective segregation even when the signal-to-noise ratio is as low as 0 dB. Moreover, the segregation performance remains high when the speech contains jitter, or when the estimate of the fundamental frequency F0 is inaccurate. This contrasts with conventional comb-filter methods where errors in F0 estimation produce a marked reduction in performance. We compared the new method to a comb-filter method using a cross-correlation measure and perceptual recognition experiments. The results suggest that the new method has the potential to supplant comb-filter and harmonic-selection methods for speech enhancement.

中文翻译：

使用具有事件同步增强功能的听觉声码器进行语音分离。

我们提出了一种使用通道声码器的听觉版本来分离并发语音的新方法。与传统的基于窗口的处理系统不同，声音的听觉表示（称为“听觉图像”）保留了精细的时间信息。这使得通过事件同步过程分离语音源成为可能。基频信息用于估计目标说话者的声门脉冲时间序列，并抑制其他说话者的声门事件。即使信噪比低至 0 dB，该过程也能稳健地提取目标语音并进行有效分离。此外，当语音包含抖动或基频 F0 的估计不准确时，隔离性能仍然很高。这与传统的梳状滤波器方法形成鲜明对比，传统的梳状滤波器方法中 F0 估计的错误会导致性能显着下降。我们使用互相关测量和感知识别实验将新方法与梳状滤波器方法进行了比较。结果表明，新方法有可能取代梳状滤波器和谐波选择方法来增强语音。

更新日期：2019-11-01

点击分享查看原文

点击收藏

公开下载