Real-time speech enhancement algorithm for transient noise suppression

Liang, Ruiyu; Xie, Yue; Cheng, Jiaming; Tang, Guichen; Sun, Shinuo

doi:10.1007/s11042-020-09849-8

Real-time speech enhancement algorithm for transient noise suppression

Published: 23 September 2020

Volume 80, pages 3681–3702, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Ruiyu Liang ORCID: orcid.org/0000-0002-6813-4203¹,
Yue Xie¹,
Jiaming Cheng²,
Guichen Tang¹ &
…
Shinuo Sun²

627 Accesses
3 Citations
Explore all metrics

Abstract

To effectively restrain stationary noise and transient noise, a real-time single-channel speech enhancement algorithm is proposed. First, to evaluate stationary noise, the quantile noise estimation method is used to obtain the spectrum of stationary noise. Then, based on the normalized variance and gravity center of the signal, the transient noise detection method is proposed to modify the spectrum of stationary noise. Next, the speech presence probability is estimated based on the speech features and harmonic analysis. Finally, the optimized-modified log-spectral amplitude (OM-LSA) estimator is adopted for speech enhancement. The experimental noise contains 115 environmental sounds with the SNR of −10 to 10 dB. The experimental results show that the performance of the proposed algorithm is comparable to the OM-LSA algorithm which has good denoising performance, but the real-time performance of the former is much better. Compared with the Webrtc real-time algorithm, under the overall performance of stationary noise and transient noise, the overall speech quality indicators of the improved algorithm increased by 7.5%, 7.8% and 5.0%, respectively. And the short-time objective intelligibility increased by 2.4%, 2.4% and 2.0%, respectively. Even compared with the recurrent neural network(RNN) algorithm, the suppression performance of the transient noise is better. Besides, the real-time experiment base on the hardware platform shows that the runtime of processing a 10 ms frame is 4.3 ms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

Mishaim Malik, Muhammad Kamran Malik, … Imran Makhdoom

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Amandeep Singh Dhanjal & Williamjeet Singh

Conventional and contemporary approaches used in text to speech synthesis: a review

Article 13 November 2022

Navdeep Kaur & Parminder Singh

References

Boll SF (1979) A spectral subtraction algorithm for suppression of acoustic noise in speech. Acoustics Speech & Signal Processing IEEE Transactions on 27(2):113–120
Article Google Scholar
Brockwell P, Dahlhaus R (2004) Generalized Levinson–Durbin and burg algorithms. J Econ 118(1–2):129–149
Article MathSciNet Google Scholar
Cappé O (1994) Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE transactions on Speech Audio Processing 2(2):345–349
Article Google Scholar
Chen J, Wang D (2017) Long short-term memory for speaker generalization in supervised speech separation. The Journal of the Acoustical Society of America 141(6):4705–4714
Article Google Scholar
Cohen I (2003) Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Transactions on Speech & Audio Processing 11(5):466–475
Article Google Scholar
Cohen I, Berdugo B (2001) Speech enhancement for non-stationary noise environments. Signal Process 81(11):2403–2418
Article Google Scholar
Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–42
Article Google Scholar
Ephraim Y, Malah D (1985) Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics Speech & Signal Processing 33(2):443–445
Article Google Scholar
Ephraim Y, Malah D (2003) Speech enhancement using a minimum Mean-Square error log-spectral amplitude estimator. IEEE Transactions on Acoustics Speech & Signal Processing 32(6):1109–1121
Article Google Scholar
Fu S, Tsao Y, Lu X, Kawai H (2017) Raw waveform-based speech enhancement by fully convolutional networks, in 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). 006–012.
Gao T, Du J, Dai L-R, Lee C-H (2016) SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement, in INTERSPEECH. 3713–3717.
Griffin DW, Lim JS (1988) Multiband excitation vocoder. IEEE Transactions on acoustics, speech, signal processing 36(8):1223–1235
Article Google Scholar
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet Google Scholar
Hirszhorn A, Dov D, Talmon R, Cohen I (2012) Transient interference suppression in speech signals based on the OM-LSA algorithm, in IWAENC 2012; International Workshop on Acoustic Signal Enhancement. , VDE. 1–4.
Hu Y, Loizou PC (2008) Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech & Language Processing 16(1):229–238
Article Google Scholar
Hu G, Wang DL (2010) A tandem algorithm for pitch estimation and voiced speech segregation. IEEE Transactions on Audio Speech Language Processing 18(8):2067–2079
Article Google Scholar
Inc. G. WebRTC. https://webrtc.org.cn/mirror/
Kennedy D, Corrsin S (1961) Spectral flatness factor and ‘intermittency’in turbulence and in non-linear noise. J Fluid Mech 10(3):366–370
Article Google Scholar
Kim M, Smaragdis P (2015) Adaptive denoising autoencoders: A fine-tuning scheme to learn from test mixtures, in International Conference on Latent Variable Analysis and Signal Separation. Springer. 100–107.
Kumar B (2018) Comparative performance evaluation of MMSE-based speech enhancement techniques through simulation and real-time implementation. International Journal of Speech Technology 21(4):1033–1044
Article Google Scholar
Leng L, Zhang J, Xu J, Khan K, Alghathbar K (2010) Dynamic weighted discrimination power analysis: a novel approach for face and palmprint recognition in DCT domain. International Journal of Physical Sciences 5(17):467–471
Google Scholar
Leng L, Li M, Kim C, Bi X (2017) Dual-source discrimination power analysis for multi-instance contactless palmprint recognition. Multimed Tools Appl 76(1):333–354
Article Google Scholar
Li J, Wang S, Peng R, Zheng C, Li X (2014) Transient noise reduction based on speech reconstruction, in the 21st international congress on sound and vibration, Beijing/China. 1–8.
Lu X, Tsao Y, Matsuda S, Hori C (2013) Speech enhancement based on deep denoising autoencoder, in Interspeech. 436–440.
Manohar K, Rao P (2006) Speech enhancement in nonstationary noise environments using noise properties. Speech Comm 48(1):96–109
Article Google Scholar
Michelsanti D, Tan Z-H (2017) Conditional generative adversarial networks for speech enhancement and noise-robust speaker verification, in Proc. INTERSPEECH. 2008–2012.
Nongpiur RC (2008) Impulse noise removal in speech using wavelets, in 2008 IEEE international conference on acoustics, speech and signal processing. IEEE. 1593–1596.
Pandey A, Wang D (2019) TCNN: Temporal Convolutional Neural Network for Real-time Speech Enhancement in the Time Domain, in 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019, May 12, 2019 - May 17, 2019. Institute of Electrical and Electronics Engineers Inc.: Brighton, United kingdom. 6875–6879.
Pandey A, Wang D (2020) Densely Connected Neural Network with Dilated Convolutions for Real-Time Speech Enhancement in The Time Domain, in ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6629–6633.
Pascual S, Bonafonte A, Serra J (2017) SEGAN: Speech enhancement generative adversarial network, in Proc. INTERSPEECH. 3642–3646.
Rosen S (1992) Temporal information in speech: acoustic, auditory and linguistic aspects. Philosophical Transactions of The Royal Society B Biological Sciences 336(1278):367–373
Article Google Scholar
Stahl V, Fischer A, Bippus R (2000) Quantile based noise estimation for spectral subtraction and Wiener filtering, in 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 00CH37100). IEEE. 1875-1878.
Taal CH, Hendriks RC, Heusdens R, Jensen J (2011) An algorithm for intelligibility prediction of time–frequency weighted Noisy speech. IEEE Transactions on Audio Speech Language Processing 19(7):2125–2136
Article Google Scholar
Takeuchi D, Yatabe K, Koizumi Y, Oikawa Y, Harada N (2020) Real-Time Speech Enhancement Using Equilibriated RNN, in ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 851–855.
Tan K, Wang D (2018) A convolutional recurrent neural network for real-time speech enhancement, in 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018, September 2, 2018 - September 6, 2018. International speech communication association: Hyderabad, India. 3229-3233.
Tan K, Zhang X, Wang D (2019) Real-time Speech Enhancement Using an Efficient Convolutional Recurrent Network for Dual-microphone Mobile Phones in Close-talk Scenarios, in ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5751–5755.
Valin J-M (2018) A hybrid DSP/deep learning approach to real-time full-band speech enhancement, in 20th IEEE international workshop on multimedia signal processing, MMSP 2018, august 29, 2018 - august 31, 2018. Institute of Electrical and Electronics Engineers Inc.: Vancouver, BC, Canada.
Varga A, Steeneken HJM (1993) Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Comm 12(3):247–251
Article Google Scholar
Wang DL (2017) Deep learning reinvents the hearing aid. IEEE Spectr 54(3):32–37
Article Google Scholar
Wang D, Chen J (2018) Supervised speech separation based on deep learning: an overview. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26(10):1702–1726
Article Google Scholar
Weninger F, Hershey J R, Le Roux J, Schuller B (2014) Discriminatively trained recurrent neural networks for single-channel speech separation, in proceedings 2nd IEEE global conference on signal and information processing, GlobalSIP, Machine Learning Applications in Speech Processing Symposium, Atlanta, GA, USA.
Weninger F, Erdogan H, Watanabe S, Vincent E, Le Roux J, Hershey J R, Schuller B (2015) Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR, in International Conference on Latent Variable Analysis and Signal Separation. Springer. 91–99.
Xishuang Y, Zhaoxiong L (2004) Implementation Summary of Mandarin Chinese Test.the commercial press
Xu Y (2015) Research on deep neural network based speech enhancement. University of Science and Technology of China
Xu Y, Du J, Dai L-R, Lee C-H (2014) An experimental study on speech enhancement based on deep neural networks. IEEE Signal Processing Letters 21(1):65–68
Article Google Scholar
Zheng C, Chen X, Wang S, Peng R, Li X (2013) Delayless method to suppress transient noise using speech properties and spectral coherence, in audio engineering society convention 135. Audio Engineering Society
Zheng C, Yang H, Li X (2014) On generalized auto-spectral coherence function and its applications to signal detection. IEEE Signal Processing Letters 21(5):559–563
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank the reviewers for their valuable suggestions and comments. And they also thank Mr. ChaoHe for his excellent work in algorithm design and programming. The work was supported in part by the National Key Research and Development Program of China under Grant 2020YFC2004003 and Grant 2020YFC2004002, the National Natural Science Foundation of China under Grant No. 62001215.

Author information

Authors and Affiliations

School of Information and Communication Engineering, Nanjing Institute of Technology, Nanjing, 211167, China
Ruiyu Liang, Yue Xie & Guichen Tang
School of Information Science and Engineering, Southeast University, Nanjing, 210096, China
Jiaming Cheng & Shinuo Sun

Authors

Ruiyu Liang
View author publications
You can also search for this author in PubMed Google Scholar
Yue Xie
View author publications
You can also search for this author in PubMed Google Scholar
Jiaming Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Guichen Tang
View author publications
You can also search for this author in PubMed Google Scholar
Shinuo Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruiyu Liang.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liang, R., Xie, Y., Cheng, J. et al. Real-time speech enhancement algorithm for transient noise suppression. Multimed Tools Appl 80, 3681–3702 (2021). https://doi.org/10.1007/s11042-020-09849-8

Download citation

Received: 18 August 2019
Revised: 16 August 2020
Accepted: 09 September 2020
Published: 23 September 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s11042-020-09849-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-time speech enhancement algorithm for transient noise suppression

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Conventional and contemporary approaches used in text to speech synthesis: a review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Real-time speech enhancement algorithm for transient noise suppression

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Conventional and contemporary approaches used in text to speech synthesis: a review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation