Single and multiple frame coding of LSF parameters using deep neural network and pyramid vector quantizer,Speech Communication

当前位置： X-MOL 学术 › Speech Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Single and multiple frame coding of LSF parameters using deep neural network and pyramid vector quantizer
Speech Communication ( IF 3.2 ) Pub Date : 2020-03-19 , DOI: 10.1016/j.specom.2020.03.004
Yaxing Li , Ying Kang , Hao Wu , Yu Guo , Jin Meng

In linear predictive speech coders, the linear predictive coding (LPC) parameters are usually transformed into the line spectral frequency (LSF) representation for quantization. In this paper, the single and multiple frame coding of LSF parameters using deep neural network (DNN) and pyramid vector quanrizer (PVQ) are proposed. In the single-frame scheme, a non-linear DNN predictor which demonstrates much better prediction performance than the autoregressive (AR) model is applied to exploit the inter-frame dependency of LSF parameters. The prediction residual signal has Laplacian distribution and can be efficiently quantized by the PVQ. The performance evaluation using spectral distortion shows that the proposed DNN predictive PVQ outperforms the AR predictive split vector quantization (SVQ). In the multiple-frame scheme, a deep autoencoder possessing linear coder-layer units with Gaussian noise is used to compress and de-correlate multiple LSF frames. The deep autoencoder shows a high degree of modelling flexibility for multiple LSF frames. To quantize the coder-layer vector effectively, a PVQ is considered. The experimental results show that the proposed multi-frame scheme with determined optimal coder-layer dimension outperforms the discrete cosine model (DCM)-based approach in terms of spectral distortion performance and robustness across different speech segments.

中文翻译：

使用深度神经网络和金字塔矢量量化器对LSF参数进行单帧和多帧编码

在线性预测语音编码器中，通常将线性预测编码（LPC）参数转换为线谱频率（LSF）表示形式以进行量化。本文提出了使用深度神经网络（DNN）和金字塔矢量量化器（PVQ）对LSF参数进行单帧和多帧编码。在单帧方案中，与自回归（AR）模型相比，具有更好的预测性能的非线性DNN预测器被用于开发LSF参数的帧间依赖性。预测残差信号具有拉普拉斯分布，并且可以通过PVQ有效量化。使用频谱失真的性能评估表明，所提出的DNN预测PVQ优于AR预测分割矢量量化（SVQ）。在多帧方案中，具有线性编码器层单元和高斯噪声的深层自动编码器用于压缩和解相关多个LSF帧。深度自动编码器显示了针对多个LSF帧的高度建模灵活性。为了有效地量化编码器层矢量，考虑了PVQ。实验结果表明，在不同语音段的频谱失真性能和鲁棒性方面，具有确定的最佳编码器层尺寸的拟议多帧方案优于基于离散余弦模型（DCM）的方法。

更新日期：2020-03-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>