Skip to main content
Log in

RNN-based signal classification for hybrid audio data compression

  • Published:
Computing Aims and scope Submit manuscript

Abstract

Audio data are a fundamental component of multimedia big data. Switched audio codec has been proved to be efficient for compressing a large range of audio signals at low bit rates. However, coding quality strongly relies on an exact classification of the input signals. Two coding mode selection methods are adopted in AMR-WB+, the state-of-the-art switched audio coder. The closed-loop method obtains good quality, but it has a high computation complexity. Conversely, the open-loop method reduces complexity but has unsatisfactory coding quality. Therefore, in this study, a speech/music discrimination based on a recurrent neural network (RNN) model is investigated to improve the coding performance of AMR-WB+. An RNN model is chosen for its outstanding performance on processing time series. The recurrent structure of RNN makes it capable of learning and making full use of the temporal information of the input sequences to make up for the deficiencies of the short-term features. We quantitatively analyze the quality loss caused by two types of misclassification and the tune parameter of the classifier to improve the signal-to-noise ratio (SNR) of the synthesized signals. The experimental results show that the proposed method increases the accuracy of the mode selection with a rate of 18% and the coding quality of 0.21 dB in segmental SNR in comparison with the open-loop method. Moreover, it reduces the computational complexity by about 43% in comparison with the closed-loop method in AMR-WB+.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. GPP (2005) Recommendation ETSI TS 126 290. Extended adaptive multi-rate-wideband (AMR-WB+) codec

  2. GPP (2014) Recommendation TS 26.441. Codec for enhanced voice services (EVS)

  3. MPEG (2011) Recommendation ISO/IEC 23003-3, information technology–MPEG audio technologies—part 3: unified speech and audio coding

  4. Jérémie L, Roch L, Guy R (2007) An improved low complexity AMR-WB+ encoder using neural networks for mode selection. In: 123rd convention of audio engineering society

  5. Jong-Kyu K, Nam-Soo K (2008) Improved frame mode selection for AMR-WB+ based on decision tree. IEICE Trans INF Syst E 91-D(6):1830–1833

    Google Scholar 

  6. Mu-Liang W, Mn-Ta L (2010) A neural network-based coding mode selection scheme of hybrid audio coder. In: IEEE international conference on wireless communications, pp 107–110

  7. Alessandro B, Alessandra F, Pierangelo M (2002) Audio classification in speech and music: a comparison between a statistical and a neural approach. Eur J Appl Sign Process 4:372–378

    MATH  Google Scholar 

  8. Zhonghua F, JhingFa W, Lei X (2009) Noise robust features for speech/music discrimination in real-time telecommunication. In: IEEE international conference on multimedia and expo, pp 574–577

  9. Costas P, George T (2005) A speech/music discriminator based on RMS and zero-crossing. IEEE Trans Multimed 7(1):155–166

    Article  Google Scholar 

  10. Michael JC, Eluned SP, Harvey L (1999) A comparison of features for speech, music discrimination. In: IEEE international conference in acoustic, speech, and signal processing, pp 149–152

  11. Jun W, Qiong W, Haojiang D, Qin Y (2008) Real-time speech/music classification with a hierarchical oblique decision tree. In: International conference on acoustic, speech, and signal processing, pp 2033–2036

  12. Lie L, Stan ZL, Hong-Jiang Z (2003) Content-based audio classification and segmentation by using support vector machines. ACM J Multimed Syst 8(6):482–492

    Article  Google Scholar 

  13. Juan Jose B, Alexander L (2004) Hierarchical automatic audio signal classification. J. Audio Eng Soc 52(7):724–739

    Google Scholar 

  14. Ewald W, Matthias H, Markus S (2014) Speech/music discrimination in a large data base of radio broadcasts from the wild. In: IEEE international conference on acoustic, speech and signal processing, pp 2134–2138

  15. Khaled E, Mark K, Grace P, Peter K (2000) Speech/music discrimination for multimedia application. In: IEEE international conference on acoustics, speech, and signal processing, pp 2445–2448

  16. Wu C, Liang G (2001) Robust singing detection in speech/music discriminator design. In: IEEE international conference on acoustics, speech and signal processing, pp 865–868

  17. Kun-Ching W, Yung-Ming Y, Ying-Ru Y (2017) Speech/music discrimination using hybrid-based feature extraction for audio data Indexing. In: IEEE international conference on system science and engineering, pp 515–519

  18. Eya M, Maha C, Chokri Ben A (2016) Multi-feature speech/music discrimination based on mid-term level statistics and supervised classifiers. In: IEEE international conference on computer systems and applications

  19. Arijit G, Bibhas Chandra D, Sanjoy Kumar S (2011) Speech/music classification using empirical mode decomposition. In: IEEE international conference on emerging applications of information technology, pp 49–52

  20. Kashif MKS, Wasfi G (2006) Machine-learning based classification of speech and music. J Multimed Syst 12(1):55–67

    Article  Google Scholar 

  21. Aggelos P, Sergios T (2014) Speech-music discrimination: a deep learning perspective. In: IEEE international conference on signal processing, pp 616–620

  22. Shinichi O, Ikusaburo K, Takumi I (2006) Time series data classification using recurrent neural network with ensemble learning. In: Gabrys B, Howlett RJ, Jain LC (eds) Knowledge-based intelligent information and engineering systems. Springer, Berlin, pp 742–748

    Google Scholar 

  23. Alex G, Abdel-rahman M, Geoffrey H (2013) Speech recognition with deep recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing pp 6645–6649

  24. Michael H, Peter S (2003) Recurrent neural networks for time series classification. Neurocomputing 50(C):223–235

    MATH  Google Scholar 

  25. Suman R, Andreas S (2016) A comparative study of recurrent neural network models for lexical domain classification. In: IEEE international conference on acoustics, speech and signal processing pp 6075–6079

  26. Huy P, Philipp K, Fabrice K, Marco M, Radoslaw M, Alfred M (2017) Audio scene classification with deep recurrent neural networks. In: Interspeech, pp 3043–3047

  27. Zhibin Y, Rammohan M, Minho L (2013) Supervised multiple timescale recurrent neuron network model for human action classification. In: International conference on neural information processing pp 196–203

  28. Emmanuel M, Guillaume C, Yuliya T, Pierre A (2017) Recurrent neural networks to correct satellite image classification maps. IEEE Trans Geosci Remote Sens 55(9):4962–4971

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Nature Science Foundation of China (No. 61671335) and the Technological Innovation Major Project of Hubei Province (No. 2017AAA123).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiping Tu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tu, W., Yang, Y., Du, B. et al. RNN-based signal classification for hybrid audio data compression. Computing 102, 813–827 (2020). https://doi.org/10.1007/s00607-019-00713-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-019-00713-8

Keywords

Navigation