Skip to main content
Log in

Real-time speech enhancement algorithm for transient noise suppression

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

To effectively restrain stationary noise and transient noise, a real-time single-channel speech enhancement algorithm is proposed. First, to evaluate stationary noise, the quantile noise estimation method is used to obtain the spectrum of stationary noise. Then, based on the normalized variance and gravity center of the signal, the transient noise detection method is proposed to modify the spectrum of stationary noise. Next, the speech presence probability is estimated based on the speech features and harmonic analysis. Finally, the optimized-modified log-spectral amplitude (OM-LSA) estimator is adopted for speech enhancement. The experimental noise contains 115 environmental sounds with the SNR of −10 to 10 dB. The experimental results show that the performance of the proposed algorithm is comparable to the OM-LSA algorithm which has good denoising performance, but the real-time performance of the former is much better. Compared with the Webrtc real-time algorithm, under the overall performance of stationary noise and transient noise, the overall speech quality indicators of the improved algorithm increased by 7.5%, 7.8% and 5.0%, respectively. And the short-time objective intelligibility increased by 2.4%, 2.4% and 2.0%, respectively. Even compared with the recurrent neural network(RNN) algorithm, the suppression performance of the transient noise is better. Besides, the real-time experiment base on the hardware platform shows that the runtime of processing a 10 ms frame is 4.3 ms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Boll SF (1979) A spectral subtraction algorithm for suppression of acoustic noise in speech. Acoustics Speech & Signal Processing IEEE Transactions on 27(2):113–120

    Article  Google Scholar 

  2. Brockwell P, Dahlhaus R (2004) Generalized Levinson–Durbin and burg algorithms. J Econ 118(1–2):129–149

    Article  MathSciNet  Google Scholar 

  3. Cappé O (1994) Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE transactions on Speech Audio Processing 2(2):345–349

    Article  Google Scholar 

  4. Chen J, Wang D (2017) Long short-term memory for speaker generalization in supervised speech separation. The Journal of the Acoustical Society of America 141(6):4705–4714

    Article  Google Scholar 

  5. Cohen I (2003) Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Transactions on Speech & Audio Processing 11(5):466–475

    Article  Google Scholar 

  6. Cohen I, Berdugo B (2001) Speech enhancement for non-stationary noise environments. Signal Process 81(11):2403–2418

    Article  Google Scholar 

  7. Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–42

    Article  Google Scholar 

  8. Ephraim Y, Malah D (1985) Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics Speech & Signal Processing 33(2):443–445

    Article  Google Scholar 

  9. Ephraim Y, Malah D (2003) Speech enhancement using a minimum Mean-Square error log-spectral amplitude estimator. IEEE Transactions on Acoustics Speech & Signal Processing 32(6):1109–1121

    Article  Google Scholar 

  10. Fu S, Tsao Y, Lu X, Kawai H (2017) Raw waveform-based speech enhancement by fully convolutional networks, in 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). 006–012.

  11. Gao T, Du J, Dai L-R, Lee C-H (2016) SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement, in INTERSPEECH. 3713–3717.

  12. Griffin DW, Lim JS (1988) Multiband excitation vocoder. IEEE Transactions on acoustics, speech, signal processing 36(8):1223–1235

    Article  Google Scholar 

  13. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  Google Scholar 

  14. Hirszhorn A, Dov D, Talmon R, Cohen I (2012) Transient interference suppression in speech signals based on the OM-LSA algorithm, in IWAENC 2012; International Workshop on Acoustic Signal Enhancement. , VDE. 1–4.

  15. Hu Y, Loizou PC (2008) Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech & Language Processing 16(1):229–238

    Article  Google Scholar 

  16. Hu G, Wang DL (2010) A tandem algorithm for pitch estimation and voiced speech segregation. IEEE Transactions on Audio Speech Language Processing 18(8):2067–2079

    Article  Google Scholar 

  17. Inc. G. WebRTC. https://webrtc.org.cn/mirror/

  18. Kennedy D, Corrsin S (1961) Spectral flatness factor and ‘intermittency’in turbulence and in non-linear noise. J Fluid Mech 10(3):366–370

    Article  Google Scholar 

  19. Kim M, Smaragdis P (2015) Adaptive denoising autoencoders: A fine-tuning scheme to learn from test mixtures, in International Conference on Latent Variable Analysis and Signal Separation. Springer. 100–107.

  20. Kumar B (2018) Comparative performance evaluation of MMSE-based speech enhancement techniques through simulation and real-time implementation. International Journal of Speech Technology 21(4):1033–1044

    Article  Google Scholar 

  21. Leng L, Zhang J, Xu J, Khan K, Alghathbar K (2010) Dynamic weighted discrimination power analysis: a novel approach for face and palmprint recognition in DCT domain. International Journal of Physical Sciences 5(17):467–471

    Google Scholar 

  22. Leng L, Li M, Kim C, Bi X (2017) Dual-source discrimination power analysis for multi-instance contactless palmprint recognition. Multimed Tools Appl 76(1):333–354

    Article  Google Scholar 

  23. Li J, Wang S, Peng R, Zheng C, Li X (2014) Transient noise reduction based on speech reconstruction, in the 21st international congress on sound and vibration, Beijing/China. 1–8.

  24. Lu X, Tsao Y, Matsuda S, Hori C (2013) Speech enhancement based on deep denoising autoencoder, in Interspeech. 436–440.

  25. Manohar K, Rao P (2006) Speech enhancement in nonstationary noise environments using noise properties. Speech Comm 48(1):96–109

    Article  Google Scholar 

  26. Michelsanti D, Tan Z-H (2017) Conditional generative adversarial networks for speech enhancement and noise-robust speaker verification, in Proc. INTERSPEECH. 2008–2012.

  27. Nongpiur RC (2008) Impulse noise removal in speech using wavelets, in 2008 IEEE international conference on acoustics, speech and signal processing. IEEE. 1593–1596.

  28. Pandey A, Wang D (2019) TCNN: Temporal Convolutional Neural Network for Real-time Speech Enhancement in the Time Domain, in 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019, May 12, 2019 - May 17, 2019. Institute of Electrical and Electronics Engineers Inc.: Brighton, United kingdom. 6875–6879.

  29. Pandey A, Wang D (2020) Densely Connected Neural Network with Dilated Convolutions for Real-Time Speech Enhancement in The Time Domain, in ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6629–6633.

  30. Pascual S, Bonafonte A, Serra J (2017) SEGAN: Speech enhancement generative adversarial network, in Proc. INTERSPEECH. 3642–3646.

  31. Rosen S (1992) Temporal information in speech: acoustic, auditory and linguistic aspects. Philosophical Transactions of The Royal Society B Biological Sciences 336(1278):367–373

    Article  Google Scholar 

  32. Stahl V, Fischer A, Bippus R (2000) Quantile based noise estimation for spectral subtraction and Wiener filtering, in 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 00CH37100). IEEE. 1875-1878.

  33. Taal CH, Hendriks RC, Heusdens R, Jensen J (2011) An algorithm for intelligibility prediction of time–frequency weighted Noisy speech. IEEE Transactions on Audio Speech Language Processing 19(7):2125–2136

    Article  Google Scholar 

  34. Takeuchi D, Yatabe K, Koizumi Y, Oikawa Y, Harada N (2020) Real-Time Speech Enhancement Using Equilibriated RNN, in ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 851–855.

  35. Tan K, Wang D (2018) A convolutional recurrent neural network for real-time speech enhancement, in 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018, September 2, 2018 - September 6, 2018. International speech communication association: Hyderabad, India. 3229-3233.

  36. Tan K, Zhang X, Wang D (2019) Real-time Speech Enhancement Using an Efficient Convolutional Recurrent Network for Dual-microphone Mobile Phones in Close-talk Scenarios, in ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5751–5755.

  37. Valin J-M (2018) A hybrid DSP/deep learning approach to real-time full-band speech enhancement, in 20th IEEE international workshop on multimedia signal processing, MMSP 2018, august 29, 2018 - august 31, 2018. Institute of Electrical and Electronics Engineers Inc.: Vancouver, BC, Canada.

  38. Varga A, Steeneken HJM (1993) Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Comm 12(3):247–251

    Article  Google Scholar 

  39. Wang DL (2017) Deep learning reinvents the hearing aid. IEEE Spectr 54(3):32–37

    Article  Google Scholar 

  40. Wang D, Chen J (2018) Supervised speech separation based on deep learning: an overview. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26(10):1702–1726

    Article  Google Scholar 

  41. Weninger F, Hershey J R, Le Roux J, Schuller B (2014) Discriminatively trained recurrent neural networks for single-channel speech separation, in proceedings 2nd IEEE global conference on signal and information processing, GlobalSIP, Machine Learning Applications in Speech Processing Symposium, Atlanta, GA, USA.

  42. Weninger F, Erdogan H, Watanabe S, Vincent E, Le Roux J, Hershey J R, Schuller B (2015) Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR, in International Conference on Latent Variable Analysis and Signal Separation. Springer. 91–99.

  43. Xishuang Y, Zhaoxiong L (2004) Implementation Summary of Mandarin Chinese Test.the commercial press

  44. Xu Y (2015) Research on deep neural network based speech enhancement. University of Science and Technology of China

  45. Xu Y, Du J, Dai L-R, Lee C-H (2014) An experimental study on speech enhancement based on deep neural networks. IEEE Signal Processing Letters 21(1):65–68

    Article  Google Scholar 

  46. Zheng C, Chen X, Wang S, Peng R, Li X (2013) Delayless method to suppress transient noise using speech properties and spectral coherence, in audio engineering society convention 135. Audio Engineering Society

  47. Zheng C, Yang H, Li X (2014) On generalized auto-spectral coherence function and its applications to signal detection. IEEE Signal Processing Letters 21(5):559–563

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the reviewers for their valuable suggestions and comments. And they also thank Mr. ChaoHe for his excellent work in algorithm design and programming. The work was supported in part by the National Key Research and Development Program of China under Grant 2020YFC2004003 and Grant 2020YFC2004002, the National Natural Science Foundation of China under Grant No. 62001215.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruiyu Liang.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, R., Xie, Y., Cheng, J. et al. Real-time speech enhancement algorithm for transient noise suppression. Multimed Tools Appl 80, 3681–3702 (2021). https://doi.org/10.1007/s11042-020-09849-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09849-8

Keywords

Navigation