当前位置: X-MOL 学术Appl. Acoust. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Combining adaptive sparse NMF feature extraction and soft mask to optimize DNN for speech enhancement
Applied Acoustics ( IF 3.4 ) Pub Date : 2021-01-01 , DOI: 10.1016/j.apacoust.2020.107666
Hairong Jia , Weimei Wang , Shulin Mei

Abstract In masking-based deep neural network (DNN) speech enhancement, the time–frequency masking value cannot be estimated accurately because the potential structure information of speech is ignored. In this paper, a speech enhancement method is proposed by combining adaptive sparse non-negative matrix factorization (NMF) feature extraction and soft mask to optimize DNN, using the advantages of the sparse matrix in catching the protruding structure of speech and combining with optimized masking-based prediction. First, considering the dominance of speech and noise interference in different noisy speech signals, this paper proposes a new method for estimating soft mask value, and the initial soft mask value is estimated by using speech cochleagram and noise cochleagram. Then, speech cochleagram and noise cochleagram are learned separately by the sparse NMF (SNMF) to obtain a joint dictionary. The noisy speech is sparsely represented on the joint dictionary, and the adaptive adjustment factor related to the changes of speech and noise dictionary is added to obtain the sparse coefficient. The sparse coefficient is used as the input of the DNN model, and the initial soft mask value is used as the learning label to estimate the final soft mask value. Finally, the estimated soft mask value is combined with the noisy speech cochleagram to obtain enhanced speech. Compared with other methods, the results show that 1.6039 dB increases the average signal-to-noise ratio (SNR) of the proposed method, the average perceptual evaluation of speech quality (PESQ) is increased by 0.1994, and the average short-time objective intelligibility (STOI) is improved by 0.0271, which fully illustrate the superiority of the proposed algorithm.

中文翻译:

结合自适应稀疏 NMF 特征提取和软掩码优化 DNN 进行语音增强

摘要 在基于掩蔽的深度神经网络(DNN)语音增强中,由于忽略了语音潜在的结构信息,无法准确估计时频掩蔽值。本文提出了一种语音增强方法,结合自适应稀疏非负矩阵分解(NMF)特征提取和软掩码优化DNN,利用稀疏矩阵在捕捉语音突出结构方面的优势,并结合优化掩码基于预测。首先,考虑到语音和噪声干扰在不同带噪语音信号中的优势,提出了一种新的软掩码估计方法,利用语音耳蜗图和噪声耳蜗图来估计初始软掩码值。然后,通过稀疏NMF(SNMF)分别学习语音耳蜗图和噪声耳蜗图,得到联合字典。含噪语音在联合字典上进行稀疏表示,加入与语音和噪声字典变化相关的自适应调整因子,得到稀疏系数。稀疏系数作为DNN模型的输入,初始软掩码值作为学习标签估计最终的软掩码值。最后,估计的软掩码值与带噪语音耳蜗图相结合以获得增强语音。结果表明,与其他方法相比,该方法的平均信噪比(SNR)提高了1.6039 dB,语音质量的平均感知评价(PESQ)提高了0.1994,
更新日期:2021-01-01
down
wechat
bug