A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields.,IEEE/ACM Transactions on Audio, Speech, and Language Processing

当前位置： X-MOL 学术 › IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields.
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 5.4 ) Pub Date : 2015-12-01 , DOI: 10.1109/taslp.2015.2481179
Michael A Carlin ₁ , Mounya Elhilali ₁

Affiliation

One of the hallmarks of sound processing in the brain is the ability of the nervous system to adapt to changing behavioral demands and surrounding soundscapes. It can dynamically shift sensory and cognitive resources to focus on relevant sounds. Neurophysiological studies indicate that this ability is supported by adaptively retuning the shapes of cortical spectro-temporal receptive fields (STRFs) to enhance features of target sounds while suppressing those of task-irrelevant distractors. Because an important component of human communication is the ability of a listener to dynamically track speech in noisy environments, the solution obtained by auditory neurophysiology implies a useful adaptation strategy for speech activity detection (SAD). SAD is an important first step in a number of automated speech processing systems, and performance is often reduced in highly noisy environments. In this paper, we describe how task-driven adaptation is induced in an ensemble of neurophysiological STRFs, and show how speech-adapted STRFs reorient themselves to enhance spectro-temporal modulations of speech while suppressing those associated with a variety of nonspeech sounds. We then show how an adapted ensemble of STRFs can better detect speech in unseen noisy environments compared to an unadapted ensemble and a noise-robust baseline. Finally, we use a stimulus reconstruction task to demonstrate how the adapted STRF ensemble better captures the spectrotemporal modulations of attended speech in clean and noisy conditions. Our results suggest that a biologically plausible adaptation framework can be applied to speech processing systems to dynamically adapt feature representations for improving noise robustness.

中文翻译：

使用自适应听觉感受域进行语音活动检测的框架。

大脑中声音处理的标志之一是神经系统适应不断变化的行为需求和周围声景的能力。它可以动态转移感官和认知资源以专注于相关声音。神经生理学研究表明，通过自适应地调整皮层的光谱时空感受场（STRF）的形状来支持此功能，以增强目标声音的特征，同时抑制与任务无关的干扰物。因为人类交流的重要组成部分是听众在嘈杂的环境中动态跟踪语音的能力，所以通过听觉神经生理学获得的解决方案暗示了语音活动检测（SAD）的有用适应策略。SAD是许多自动化语音处理系统中重要的第一步，在嘈杂的环境中，性能通常会降低。在本文中，我们描述了如何在神经生理性STRF集合中诱导任务驱动的适应性，并说明适应语音的STRF如何重新定向以增强语音的频谱时间调制，同时抑制与多种非语音相关的语音。然后，我们展示了经过调整的STRF集合与未适应的集合和抗噪能力强的基线相比如何能够更好地在看不见的嘈杂环境中检测语音。最后，我们使用刺激重建任务来演示经过调整的STRF集合如何在干净和嘈杂的条件下更好地捕获出席语音的光谱时间调制。

更新日期：2019-11-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>