Convolutive Transfer Function-Based Multichannel Nonnegative Matrix Factorization for Overdetermined Blind Source Separation,IEEE/ACM Transactions on Audio, Speech, and Language Processing

当前位置： X-MOL 学术 › IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Convolutive Transfer Function-Based Multichannel Nonnegative Matrix Factorization for Overdetermined Blind Source Separation
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 4.1 ) Pub Date : 2022-01-25 , DOI: 10.1109/taslp.2022.3145304
Taihui Wang , Feiran Yang , Jun Yang

Most multichannel blind source separation (BSS) approaches rely on a spatial model to encode the transfer functions from sources to microphones and a source model to encode the source power spectral density. The rank-1 spatial model has been widely exploited in independent component analysis (ICA), independent vector analysis (IVA), and independent low-rank matrix analysis (ILRMA). The full-rank spatial model is also considered in many BSS approaches, such as full-rank spatial covariance matrix analysis (FCA), multichannel nonnegative matrix factorization (MNMF), and FastMNMF, which can improve the separation performance in the case of long reverberation times. This paper proposes a new MNMF framework based on the convolutive transfer function (CTF) for overdetermined BSS. The time-domain convolutive mixture model is approximated by a frequency-wise convolutive mixture model instead of the widely adopted frequency-wise instantaneous mixture model. The iterative projection algorithm is adopted to estimate the demixing matrix, and the multiplicative update rule is employed to estimate nonnegative matrix factorization (NMF) parameters. Finally, the source image is reconstructed using a multichannel Wiener filter. The advantages of the proposed method are twofold. First, the CTF approximation enables us to use a short window to represent long impulse responses. Second, the full-rank spatial model can be derived based on the CTF approximation and slowly time-variant source variances, and close relationships between the proposed method and ILRMA, FCA, MNMF and FastMNMF are revealed. Extensive experiments show that the proposed algorithm achieves a higher separation performance than ILRMA and FastMNMF in reverberant environments.

中文翻译：

基于卷积传递函数的多通道非负矩阵分解超定盲源分离

大多数多通道盲源分离（BSS）方法依赖于空间模型来编码从源到麦克风的传递函数以及源模型来编码源功率谱密度。 1 阶空间模型已广泛应用于独立分量分析 (ICA)、独立向量分析 (IVA) 和独立低秩矩阵分析 (ILRMA)。许多BSS方法中也考虑了全秩空间模型，例如全秩空间协方差矩阵分析（FCA）、多通道非负矩阵分解（MNMF）和FastMNMF，可以提高长混响情况下的分离性能次。本文针对超定 BSS 提出了一种基于卷积传递函数（CTF）的新 MNMF 框架。时域卷积混合模型是用频率方向卷积混合模型来近似的，而不是广泛采用的频率方向瞬时混合模型。采用迭代投影算法估计去混合矩阵，并采用乘法更新规则估计非负矩阵分解（NMF）参数。最后，使用多通道维纳滤波器重建源图像。所提出方法的优点是双重的。首先，CTF 近似使我们能够使用短窗口来表示长脉冲响应。其次，可以基于CTF近似和慢时变源方差导出全秩空间模型，并揭示了该方法与ILRMA、FCA、MNMF和FastMNMF之间的密切关系。大量实验表明，该算法在混响环境中实现了比 ILRMA 和 FastMNMF 更高的分离性能。

更新日期：2022-01-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文