Supervised Single Channel Speech Enhancement Based on Stationary Wavelet Transforms and Non-negative Matrix Factorization with Concatenated Framing Process and Subband Smooth Ratio Mask,Journal of Signal Processing Systems

当前位置： X-MOL 学术 › J. Sign. Process. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Supervised Single Channel Speech Enhancement Based on Stationary Wavelet Transforms and Non-negative Matrix Factorization with Concatenated Framing Process and Subband Smooth Ratio Mask
Journal of Signal Processing Systems ( IF 1.6 ) Pub Date : 2019-09-13 , DOI: 10.1007/s11265-019-01480-7
Md Shohidul Islam , Tarek Hasan Al Mahmud , Wasim Ullah Khan , Zhongfu Ye

In this paper, we propose a novel single channel speech enhancement approach that takes up the Stationary Wavelet Transform (SWT) and Nonnegative Matrix Factorization (NMF) with Concatenated Framing Process (CFP) and proposes Subband Smooth Ratio Mask (ssRM). Due to downsampling process after filtering, Discrete Wavelet Packet Transform (DWPT) suffers the absence of shift-invariance, and for this reason, some errors occur in the signal reconstruction and to mitigate the problem, firstly we use SWT and NMF with KL cost function. Secondly, we exploit the CFP to build each column of the matrix instead of using NMF directly to take advantage of smooth decomposition. Thirdly, we apply the Auto-Regressive Moving Average (ARMA) filtering process to the newly formed matrices for making the speech more stable and standardized. Finally, we propose an ssRM by combing the Standard Ratio Mask (sRM) and Square Root Ratio Mask (srRM) with Normalized Cross-Correlation Coefficients (NCCC) to take the advantages of them (sRM, srRM and NCCC). In short, the SWT divides the time-domain mixing speech signal into a set of subband signals and then framing and taking the absolute value of each subband signal, and we obtain nonnegative matrices. Then, we form the new matrices by applying the CFP where each column of the formed matrix contains five consequent frames of the nonnegative matrix and performs an ARMA filtering operation. After that, we apply NMF to each newly formed matrix and detect the speech components via proposed ssRM. Finally, the estimated signal can be achieved through them by applying inverse SWT. Our approach is evaluated using IEEE corpus and different types of noises. Objective speech quality and intelligibility improve significantly by applying this approach and outperforms related methods such as conventional STFT-NMF and DWPT-NMF.

中文翻译：

基于固定小波变换和非负矩阵分解的级联帧处理和子带平滑比掩模的监督单通道语音增强

在本文中，我们提出了一种新颖的单通道语音增强方法，该方法采用了级联定帧过程（CFP）的固定小波变换（SWT）和非负矩阵分解（NMF），并提出了子带平滑率掩码（ssRM）。由于滤波后的下采样过程，离散小波包变换（DWPT）缺乏平移不变性，因此，信号重构中会出现一些误差并缓解该问题，首先我们将SWT和NMF与KL成本函数一起使用。其次，我们利用CFP构建矩阵的每一列，而不是直接使用NMF来利用平滑分解。第三，我们对新形成的矩阵应用自动回归移动平均（ARMA）滤波过程，以使语音更加稳定和标准化。最后，我们通过结合标准比率掩码（sRM）和平方根比率掩码（srRM）与归一化互相关系数（NCCC）来提出ssRM，以利用它们的优势（sRM，srRM和NCCC）。简而言之，SWT将时域混合语音信号划分为一组子带信号，然后对每个子带信号进行取帧并取其绝对值，从而获得非负矩阵。然后，我们通过应用CFP形成新矩阵，其中所形成矩阵的每一列包含非负矩阵的五个后续帧，并执行ARMA滤波操作。之后，我们将NMF应用于每个新形成的矩阵，并通过建议的ssRM检测语音成分。最后，通过应用逆SWT可以通过它们获得估计的信号。我们的方法是使用IEEE语料库和不同类型的噪声进行评估的。

更新日期：2020-04-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文