Vector-to-Vector Regression via Distributional Loss for Speech Enhancement,IEEE Signal Processing Letters

当前位置： X-MOL 学术 › IEEE Signal Process. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Vector-to-Vector Regression via Distributional Loss for Speech Enhancement
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 2021-01-08 , DOI: 10.1109/lsp.2021.3050386
Sabato Marco Siniscalchi

In this work, we leverage on a novel distributional loss to improve vector-to-vector regression for feature-based speech enhancement (SE). The distributional loss function is devised based on the Kullback-Leibler divergence between a selected target distribution and a conditional distribution to be learned from the data for each coefficient in the clean speech vector given the noisy input features. A deep model having a softmax layer per coefficient is employed to parametrize the conditional distribution, and deep model parameters are found by minimizing a weighted sum of the cross-entropy between its outputs and respective target distributions. Experiments with convolutional neural networks (CNNs) on publicly available noisy speech dataset obtained from the Voice Bank corpus show consistent improvement over conventional solutions based on the mean squared error (MSE), and the least absolute deviation (LAD). Moreover, our approach compares favourably in terms of both speech quality and intelligibility against the Mixture Density Networks (MDNs), which is also an approach that relies on computing parametric conditional distributions based on Gaussian mixture models (GMMs) and a neural architecture. Comparison against GAN-based solutions are presented as well.

中文翻译：

通过分布损失进行矢量到矢量回归以增强语音

在这项工作中，我们利用一种新颖的分布损失来改善基于特征的语音增强（SE）的矢量到矢量回归。分配损失函数是基于选定目标分布和条件分布之间的Kullback-Leibler散度设计的，给定了嘈杂的输入特征，该条件分布是从干净语音向量中每个系数的数据中学习的。使用每个系数具有softmax层的深度模型来对条件分布进行参数化，并通过最小化其输出和各个目标分布之间的交叉熵的加权和来找到深度模型参数。在从语音库语料库获得的公开可用噪声语音数据集上进行卷积神经网络（CNN）的实验显示，与基于均方误差（MSE）和最小绝对偏差（LAD）的传统解决方案相比，性能得到了持续改进。此外，我们的方法在语音质量和清晰度上均优于混合密度网络（MDN），这也是一种依靠基于高斯混合模型（GMM）和神经体系结构计算参数条件分布的方法。还介绍了与基于GAN的解决方案的比较。这也是一种基于高斯混合模型（GMM）和神经体系结构来计算参数条件分布的方法。还介绍了与基于GAN的解决方案的比较。这也是一种基于高斯混合模型（GMM）和神经体系结构来计算参数条件分布的方法。还介绍了与基于GAN的解决方案的比较。

更新日期：2021-02-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11