RNN-based signal classification for hybrid audio data compression,Computing

当前位置： X-MOL 学术 › Computing › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

RNN-based signal classification for hybrid audio data compression
Computing ( IF 3.3 ) Pub Date : 2019-03-26 , DOI: 10.1007/s00607-019-00713-8
Weiping Tu , Yuhong Yang , Bo Du , Wanzhao Yang , Xiong Zhang , Jiaxi Zheng

Audio data are a fundamental component of multimedia big data. Switched audio codec has been proved to be efficient for compressing a large range of audio signals at low bit rates. However, coding quality strongly relies on an exact classification of the input signals. Two coding mode selection methods are adopted in AMR-WB+, the state-of-the-art switched audio coder. The closed-loop method obtains good quality, but it has a high computation complexity. Conversely, the open-loop method reduces complexity but has unsatisfactory coding quality. Therefore, in this study, a speech/music discrimination based on a recurrent neural network (RNN) model is investigated to improve the coding performance of AMR-WB+. An RNN model is chosen for its outstanding performance on processing time series. The recurrent structure of RNN makes it capable of learning and making full use of the temporal information of the input sequences to make up for the deficiencies of the short-term features. We quantitatively analyze the quality loss caused by two types of misclassification and the tune parameter of the classifier to improve the signal-to-noise ratio (SNR) of the synthesized signals. The experimental results show that the proposed method increases the accuracy of the mode selection with a rate of 18% and the coding quality of 0.21 dB in segmental SNR in comparison with the open-loop method. Moreover, it reduces the computational complexity by about 43% in comparison with the closed-loop method in AMR-WB+.

中文翻译：

基于 RNN 的混合音频数据压缩信号分类

音频数据是多媒体大数据的基本组成部分。事实证明，切换音频编解码器对于以低比特率压缩大范围的音频信号是有效的。然而，编码质量强烈依赖于输入信号的准确分类。AMR-WB+ 是最先进的切换音频编码器，采用了两种编码模式选择方法。闭环方法质量好，但计算复杂度高。相反，开环方法降低了复杂度，但编码质量不令人满意。因此，在本研究中，研究了基于循环神经网络 (RNN) 模型的语音/音乐识别，以提高 AMR-WB+ 的编码性能。选择 RNN 模型是因为其在处理时间序列方面的出色表现。RNN 的循环结构使其能够学习并充分利用输入序列的时间信息来弥补短期特征的不足。我们定量分析了两种错误分类导致的质量损失和分类器的调谐参数，以提高合成信号的信噪比（SNR）。实验结果表明，与开环方法相比，所提出的方法在分段信噪比方面提高了18%的模式选择精度和0.21 dB的编码质量。此外，与 AMR-WB+ 中的闭环方法相比，它降低了约 43% 的计算复杂度。我们定量分析了两种错误分类导致的质量损失和分类器的调谐参数，以提高合成信号的信噪比（SNR）。实验结果表明，与开环方法相比，所提出的方法在分段信噪比方面提高了18%的模式选择精度和0.21 dB的编码质量。此外，与 AMR-WB+ 中的闭环方法相比，它降低了约 43% 的计算复杂度。我们定量分析了两种错误分类导致的质量损失和分类器的调谐参数，以提高合成信号的信噪比（SNR）。实验结果表明，与开环方法相比，所提出的方法在分段信噪比方面提高了18%的模式选择精度和0.21 dB的编码质量。此外，与 AMR-WB+ 中的闭环方法相比，它降低了约 43% 的计算复杂度。与开环方法相比，分段 SNR 为 21 dB。此外，与 AMR-WB+ 中的闭环方法相比，它降低了约 43% 的计算复杂度。与开环方法相比，分段 SNR 为 21 dB。此外，与 AMR-WB+ 中的闭环方法相比，它降低了约 43% 的计算复杂度。

更新日期：2019-03-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11