Factorized MVDR Deep Beamforming for Multi-Channel Speech Enhancement,IEEE Signal Processing Letters

当前位置： X-MOL 学术 › IEEE Signal Process. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Factorized MVDR Deep Beamforming for Multi-Channel Speech Enhancement
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 8-22-2022 , DOI: 10.1109/lsp.2022.3200581
Hansol Kim ₁ , Kyeongmuk Kang ₁ , Jong Won Shin ₁

Affiliation

Traditionally, adaptive beamformers such as the minimum-variance distortionless response (MVDR) beamformer and generalized eigenvalue beamformer have been widely used for multi-channel speech enhancement with a single-channel postfilter. Recently, several approaches have been proposed to enhance the signals used to estimate speech and noise spatial covariance matrices (SCMs) and process the outputs of the beamformers using deep neural networks (DNNs). However, the preprocessing of the signals for SCMs estimation may disrupt phase relations among input signals and the time-averages used to estimate speech and noise SCMs may not be optimal for beamformer performance even though the estimated signals are close to the ground truth. In this letter, we propose a deep beamforming approach which estimates factors of the MVDR beamformer using a DNN to circumvent the difficulty of the speech and noise SCM estimation. We formulate the MVDR beamformer as a factorized form related to two complex factors and estimate them using a DNN with a cost function comparing beamformed signal and the original clean speech. Experimental results showed that the proposed factorized MVDR beamformer could mimic the characteristics of the MVDR beamformer with true relative transfer function and noise SCM and outperformed the MVDR beamformer with deep learning-based pre- and post-processing in terms of the perceptual evaluation of speech quality scores.

中文翻译：

用于多通道语音增强的分式 MVDR 深度波束成形

传统上，诸如最小方差无失真响应（MVDR）波束形成器和广义特征值波束形成器之类的自适应波束形成器已广泛用于具有单通道后滤波器的多通道语音增强。最近，人们提出了几种方法来增强用于估计语音和噪声空间协方差矩阵（SCM）的信号，并使用深度神经网络（DNN）处理波束形成器的输出。然而，用于 SCM 估计的信号预处理可能会破坏输入信号之间的相位关系，并且用于估计语音和噪声 SCM 的时间平均值对于波束形成器性能可能不是最佳的，即使估计的信号接近地面实况。在这封信中，我们提出了一种深度波束形成方法，该方法使用 DNN 来估计 MVDR 波束形成器的因子，以避免语音和噪声 SCM 估计的困难。我们将 MVDR 波束形成器制定为与两个复杂因素相关的因式分解形式，并使用具有成本函数的 DNN 来估计它们，比较波束形成信号和原始干净语音。实验结果表明，所提出的分解MVDR波束形成器可以模仿具有真实相对传递函数和噪声SCM的MVDR波束形成器的特性，并且在语音质量的感知评估方面优于具有基于深度学习的前后处理的MVDR波束形成器分数。

更新日期：2024-08-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11