当前位置: X-MOL 学术IEEE Signal Process. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Single-Input/Binaural-Output Antiphasic Speech Enhancement Method for Speech Intelligibility Improvement
IEEE Signal Processing Letters ( IF 3.9 ) Pub Date : 2021-07-07 , DOI: 10.1109/lsp.2021.3095016
Ningning Pan , Yuzhu Wang , Jingdong Chen , Jacob Benesty

Improving intelligibility of a speech signal of interest from its observations (with a single microphone) corrupted by additive noise has long been a challenging problem. Motivated by important findings achieved in the psychoacoustic field, we propose in this work a deep learning based method to render the noise and desired speech in the perceptual space such that the perception of the desired speech is least affected by the noise. Specifically, we adopt the temporal convolutional network (TCN) based structure to map the single-channel noisy observations into two binaural signals, one for the left ear and the other for the right ear. The TCN is trained in such a way that the desired speech and noise will be perceived to be in opposite directions when the listener listens to the binaural signals. This antiphasic binaural presentation enables the listener to better distinguish the desired speech from the annoying noise for improved speech intelligibility. The modified rhyme test is performed for evaluation and the results justify the superiority of the proposed method for speech intelligibility improvement.

中文翻译:

一种提高语音清晰度的单输入/双耳输出反相语音增强方法

从被加性噪声破坏的观察(使用单个麦克风)中提高感兴趣的语音信号的可懂度长期以来一直是一个具有挑战性的问题。受心理声学领域重要发现的启发,我们在这项工作中提出了一种基于深度学习的方法,在感知空间中渲染噪声和所需语音,从而使所需语音的感知受噪声影响最小。具体来说,我们采用基于时间卷积网络 (TCN) 的结构将单通道噪声观测映射到两个双耳信号,一个用于左耳,另一个用于右耳。TCN 的训练方式是,当听众收听双耳信号时,所需的语音和噪声将被感知为相反的方向。这种反相双耳呈现使收听者能够更好地将想要的语音与烦人的噪音区分开来,从而提高语音清晰度。执行修改后的押韵测试以进行评估,结果证明了所提出的语音清晰度改进方法的优越性。
更新日期:2021-07-30
down
wechat
bug