当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Incorporating group update for speech enhancement based on convolutional gated recurrent network
Speech Communication ( IF 3.2 ) Pub Date : 2021-05-29 , DOI: 10.1016/j.specom.2021.05.003
Wenhao Yuan

To further improve the performance of speech enhancement methods based on deep neural networks, this paper proposes a speech enhancement network that can make full use of noisy speech characteristics in the time–frequency domain. First, based on the local correlation of noisy speech in the time–frequency domain and the spatial structure of frequency features, by using a recurrent neural network (RNN) to model the time correlation of noisy speech, and using a convolutional neural network (CNN) to calculate the frequency features of noisy speech, a convolutional gated recurrent network (CGRN) is built for speech enhancement. Second, based on the different variation characteristics of noisy speech over time, a group update mechanism is introduced to further improve CGRN; by artificially dividing the hidden layer features of the recurrent neural network in CGRN into three groups and updating them in three different ways, the CGRN incorporating group update (CGRN-GU) divides the variation characteristics roughly into three cases and can better track the changes of noisy speech over time. Finally, a causal speech enhancement method is proposed using the convolutional gated recurrent network incorporating group update, and extensive experiments are conducted on a public dataset. The experimental results show that, in the comparison with the state-of-the-art methods, the speech enhancement method proposed in this paper has better speech enhancement performance than other causal methods, and the proposed causal speech enhancement method even outperforms most non-causal methods.



中文翻译:

基于卷积门控循环网络的语音增强组更新

为了进一步提高基于深度神经网络的语音增强方法的性能,本文提出了一种可以充分利用时频域噪声语音特征的语音增强网络。首先,基于噪声语音在时频域的局部相关性和频率特征的空间结构,通过使用循环神经网络(RNN)对噪声语音的时间相关性进行建模,并使用卷积神经网络(CNN) ) 为了计算嘈杂语音的频率特征,构建了卷积门控循环网络 (CGRN) 以进行语音增强。其次,基于带噪语音随时间的不同变化特征,引入群更新机制,进一步完善CGRN;CGRN结合组更新(CGRN-GU)通过人为地将CGRN中循环神经网络的隐藏层特征分为三组并以三种不同的方式更新它们,将变化特征大致分为三种情况,可以更好地跟踪变化随着时间的推移嘈杂的讲话。最后,提出了一种使用包含组更新的卷积门控循环网络的因果语音增强方法,并在公共数据集上进行了大量实验。实验结果表明,与最先进的方法相比,本文提出的语音增强方法比其他因果方法具有更好的语音增强性能,并且所提出的因果语音增强方法甚至优于大多数非因果方法。因果方法。

更新日期:2021-06-01
down
wechat
bug