当前位置: X-MOL 学术Discret. Dyn. Nat. Soc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep Learning-Based Amplitude Fusion for Speech Dereverberation
Discrete Dynamics in Nature and Society ( IF 1.4 ) Pub Date : 2020-07-14 , DOI: 10.1155/2020/4618317
Chunlei Liu 1, 2 , Longbiao Wang 2 , Jianwu Dang 2, 3
Affiliation  

Mapping and masking are two important speech enhancement methods based on deep learning that aim to recover the original clean speech from corrupted speech. In practice, too large recovery errors severely restrict the improvement in speech quality. In our preliminary experiment, we demonstrated that mapping and masking methods had different conversion mechanisms and thus assumed that their recovery errors are highly likely to be complementary. Also, the complementarity was validated accordingly. Based on the principle of error minimization, we propose the fusion between mapping and masking for speech dereverberation. Specifically, we take the weighted mean of the amplitudes recovered by the two methods as the estimated amplitude of the fusion method. Experiments verify that the recovery error of the fusion method is further controlled. Compared with the existing geometric mean method, the weighted mean method we proposed has achieved better results. Speech dereverberation experiments manifest that the weighted mean method improves PESQ and SNR by 5.8% and 25.0%, respectively, compared with the traditional masking method.

中文翻译:

基于深度学习的语音去混响幅度融合

映射和掩蔽是基于深度学习的两种重要的语音增强方法,旨在从损坏的语音中恢复原始的干净语音。实际上,太大的恢复错误严重地限制了语音质量的提高。在我们的初步实验中,我们证明了映射和屏蔽方法具有不同的转换机制,因此假定它们的恢复错误极有可能是互补的。同样,互补性也得到了验证。基于误差最小化原理,我们提出了语音去混响的映射和掩蔽之间的融合。具体而言,我们将通过两种方法恢复的振幅的加权平均值作为融合方法的估计振幅。实验证明,融合方法的恢复误差得到了进一步控制。与现有的几何平均法相比,本文提出的加权平均法取得了较好的效果。语音去混响实验表明,与传统的掩蔽方法相比,加权均值方法分别将PESQ和SNR改善了5.8%和25.0%。
更新日期:2020-07-14
down
wechat
bug