当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Non-intrusive quality assessment of noise-suppressed speech using unsupervised deep features
Speech Communication ( IF 2.4 ) Pub Date : 2021-04-06 , DOI: 10.1016/j.specom.2021.03.004
Meet H. Soni , Hemant A. Patil

Objective quality assessment aims towards evaluating the perceptual quality of a signal using a machine-based algorithm. Due to different challenges involved in the subjective evaluation of speech quality, it is necessary to develop objective measures. The goal of any non-intrusive quality assessment metric for noise-suppressed speech is to assess the quality of a noise-suppressed signal in the absence of any clean reference signal. As per the ITU-T P.835 recommendations, the quality assessment of noise-suppressed speech involves predicting three quality scores, namely, signal quality, background quality, and overall quality score, and hence, considered in this study. In recent literature, the non-intrusive quality assessment problem is presented as a regression problem, in which the mapping between a set of acoustic features and corresponding quality scores is found using a perceptual model. Recently, we proposed the use of Deep Autoencoder (DAE) features and Subband Autoencoder (SBAE) features for acoustic representation and an Artificial Neural Network (ANN) as a regression model. DAE and SBAE are variants of autoencoder architecture that have bottleneck structure in the hidden layers. Such architecture represents the class of generalized nonlinear Principal Component Analysis (PCA) that guarantees reconstruction of the input features with arbitrary accuracy. Both the features (DAE and SBAE) are extracted using unsupervised deep learning architectures, and they demonstrated better performance than the state-of-the-art spectral feature set, namely, Mel Filterbank Energies (FBEs). In this paper, we present more detailed analysis of previously proposed features, i.e., DAE and SBAE features, and analyze the usefulness of these features in predicting signal as well as background quality scores in addition to the overall quality score. We compare the performance of all the three features with each other as well as with current ITU-T P.563 metric for non-intrusive speech quality assessment. The results of our experiments performed on NOIZEUS database suggest that DAE and SBAE features perform relatively better than the FBEs while predicting signal and overall quality. On the other hand, FBE features perform slightly better than the DAE and SBAE features in predicting the background quality. Moreover, another major contribution of this paper is that we employ an ANN to predict all the three quality scores simultaneously, and present the results. It was observed that using this approach, it is possible to predict all the three scores simultaneously with similar accuracy as that of predicting them individually.



中文翻译:

使用无监督的深度特征对噪声抑制的语音进行非侵入式质量评估

客观质量评估旨在使用基于机器的算法来评估信号的感知质量。由于语音质量的主观评估涉及不同的挑战,因此有必要制定客观的措施。任何用于抑制噪声的语音的非侵入式质量评估指标的目的都是在没有任何干净参考信号的情况下评估抑制噪声的信号的质量。根据ITU-T P.835建议书,受噪声抑制的语音的质量评估涉及预测三个质量得分,即信号质量,背景质量和整体质量得分,因此在本研究中进行了考虑。在最近的文献中,非侵入式质量评估问题被提出为回归问题,其中使用感知模型找到一组声学特征和相应的质量得分之间的映射。最近,我们提出将深度自动编码器(DAE)功能和子带自动编码器(SBAE)功能用于声学表示,并使用人工神经网络(ANN)作为回归模型。DAE和SBAE是自动编码器体系结构的变体,在隐藏层中具有瓶颈结构。这样的体系结构代表了一类广义的非线性主成分分析(PCA),它可以确保任意精度地重建输入特征。这两个特征(DAE和SBAE)都是使用无监督的深度学习体系结构提取的,与最先进的光谱特征集即Mel Filterbank Energies(FBE)相比,它们表现出了更好的性能。在本文中,我们将对先前提出的功能(即DAE和SBAE功能)进行更详细的分析,并分析这些功能在预测信号以及背景质量得分以及总体质量得分中的用处。我们将所有这三个功能的性能以及用于非侵入式语音质量评估的当前ITU-T P.563度量标准相互比较。我们在NOIZEUS数据库上进行的实验结果表明,在预测信号和整体质量的同时,DAE和SBAE的性能比FBE相对更好。另一方面,FBE功能在预测背景质量方面比DAE和SBAE功能稍好。此外,本文的另一个主要贡献是,我们采用了人工神经网络来同时预测所有三个质量得分,并展示结果。观察到,使用这种方法,可以同时预测所有三个分数,其准确性与单独预测它们的准确性相近。

更新日期:2021-04-20
down
wechat
bug