Skip to main content
Log in

Improved phase aware speech enhancement using bio-inspired and ANN techniques

  • Published:
Analog Integrated Circuits and Signal Processing Aims and scope Submit manuscript

Abstract

The phase modification of noisy speech signal plays a crucial role in speech enhancement (SE). In the recent past, many speech denoising algorithms have been proposed using the modification of phase information which depends on the scaling factor computed from the noise level. The performance measures of SE is significantly affected by this scaling factor and noise level estimation. However, in these algorithms, the parameters are not optimally tuned for the different noise conditions and also in some cases, the background noise is presumed to be stationary. Further, no earlier attempt has been made to obtain adaptive models which can establish the relationship between noise levels and scaling factor. Being motivated by these observations an attempt has been made in this paper to develop a neural network based model which is capable of properly estimating this scaling factor from the noise level. In the current work, a popular and efficient bio-inspired technique known as firefly algorithm is employed to determine the best possible scaling factor for each noise level. In addition, a relationship is established between noise level and scaling factor using trigonometric functional expansion based artificial neural network. An effective nonstationary noise estimation strategy is also incorporated in the proposed algorithm. Simulation-based experiments are performed to evaluate the effectiveness of the proposed SE algorithm and compared with other six standard SE algorithms using standard database. The analysis of the simulation results demonstrates that the proposed method outperforms the others in terms of both subjective and objective evaluation measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. Chen, J., Benesty, J., Huang, Y. A., & Diethorn, E. J. (2008). Fundamentals of noise reduction. In J. Benesty, M. M. Sondhi, Y. A. Huang (Eds.), Springer handbook of speech processing (pp. 843–872). Berlin: Springer.

  2. Loizou, P. C. (2013). Speech enhancement: Theory and practice. Boca Raton: CRC Press.

    Book  Google Scholar 

  3. Rahali, H., & Hajaiej, Z. (2017). Enhancement of noise-suppressed speech by spectral processing implemented in a digital signal processor. Analog Integrated Circuits and Signal Processing, 93(2), 341–350.

    Article  Google Scholar 

  4. Dash, T. K., & Solanki, S. S. (2017). Comparative study of speech enhancement algorithms and their effect on speech intelligibility. In 2017 2nd International conference on communication and electronics systems (ICCES) (pp. 270–276).

  5. Gerkmann, T., Krawczyk-Becker, M., & Roux, J. L. (2015). Phase processing for single-channel speech enhancement: History and recent advances. IEEE Signal Processing Magazine, 32(2), 55–66.

    Article  Google Scholar 

  6. Wang, D., & Lim, J. (1982). The unimportance of phase in speech enhancement. IEEE Transactions on Acoustics, Speech, and Signal Processing, 30(4), 679–681.

    Article  Google Scholar 

  7. Aarabi, P., & Shi, G. (2004). Phase-based dual-microphone robust speech enhancement. IEEE Transactions on Systems, Man, and Cybernetics Part B (Cybernetics), 34(4), 1763–1773.

    Article  Google Scholar 

  8. Deng, L., Droppo, J., & Acero, A. (2004). Enhancement of log Mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise. IEEE Transactions on Speech and Audio Processing, 12(2), 133–143.

    Article  Google Scholar 

  9. Alsteris, L. D., & Paliwal, K. K. (2007). Short-time phase spectrum in speech processing: A review and some experimental results. Digital Signal Processing, 17(3), 578–616.

    Article  Google Scholar 

  10. Stark, A. P., Wójcicki, K. K., Lyons, J. G., & Paliwal, K. K. (2008). Noise driven short-time phase spectrum compensation procedure for speech enhancement. In Ninth annual conference of the international speech communication association.

  11. Paliwal, K., Wójcicki, K., & Shannon, B. (2011). The importance of phase in speech enhancement. Speech Communication, 53(4), 465–494.

    Article  Google Scholar 

  12. Zhou, H., Jiang, Y., Chen, X., & Zu, Y. (2011). Monaural speech segregation using signal phase. In P. C. Loizou (Ed.), Advances in computer, communication, control and automation (pp. 259–266). Berlin: Springer.

  13. Mowlaee, P., & Saeidi, R. (2013). Iterative closed-loop phase-aware single-channel speech enhancement. IEEE Signal Processing Letters, 20(12), 1235–1239.

    Article  Google Scholar 

  14. Sunnydayal, V., & Kumar, T. K. (2015). Bayesian estimation for speech enhancement given a priori knowledge of clean speech phase. International Journal of Speech Technology, 18(4), 593–607.

    Article  Google Scholar 

  15. Mowlaee, P., Saeidi, R., & Stylianou, Y. (2016). Advances in phase-aware signal processing in speech communication. Speech Communication, 81, 1–29.

    Article  Google Scholar 

  16. Magron, P., Badeau, R., & David, B. (2018). Model-based STFT phase recovery for audio source separation. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 26(6), 1091–1101.

  17. Samui, S., Sahu, P., Chakrabarti, I., & Ghosh, S. K. (2017). FPGA implementation of a phase-aware single-channel speech enhancement system. Circuits, Systems, and Signal Processing, 36(11), 4688–4715.

    Article  Google Scholar 

  18. Mayer, F., Williamson, D. S., Mowlaee, P., & Wang, D. (2017). Impact of phase estimation on single-channel speech separation based on time–frequency masking. The Journal of the Acoustical Society of America, 141(6), 4668–4679.

    Article  Google Scholar 

  19. Miao, Z., Ma, X., & Ding, S. (2017). Phase constraint and deep neural network for speech separation. In International symposium on neural networks (pp. 266–273).

  20. Oo, Z., Wang, L., Phapatanaburi, K., Iwahashi, M., Nakagawa, S., & Dang, J. (2018). Phase and reverberation aware DNN for distant-talking speech enhancement. Multimedia Tools and Applications, 77, 1–16.

    Article  Google Scholar 

  21. Chiluveru, S. R., & Tripathy, M. (2019). Low SNR speech enhancement with DNN based phase estimation. International Journal of Speech Technology, 22(1), 283–292.

    Article  Google Scholar 

  22. Bendoumia, R. (2019). Two-channel forward NLMS algorithm combined with simple variable step-sizes for speech quality enhancement. Analog Integrated Circuits and Signal Processing, 98(1), 27–40.

    Article  Google Scholar 

  23. Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing, 9(5), 504–512.

    Article  Google Scholar 

  24. Cohen, I., & Berdugo, B. (2002). Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Processing Letters, 9(1), 12–15.

    Article  Google Scholar 

  25. Rangachari, S., & Loizou, P. C. (2006). A noise-estimation algorithm for highly non-stationary environments. Speech Communication, 48(2), 220–231.

    Article  Google Scholar 

  26. Cohen, I. (2003). Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing, 11(5), 466–475.

    Article  Google Scholar 

  27. Doblinger, G. (1995). Computationally efficient speech enhancement by spectral minima tracking in subbands. In Fourth European conference on speech communication and technology.

  28. Hirsch, H.-G., & Ehrlicher, C. (1995). Noise estimation techniques for robust speech recognition. In 1995 International conference on acoustics, speech, and signal processing (Vol. 1, pp. 153–156). IEEE.

  29. Sørensen, K. V., & Andersen, S. V. (2005). Speech enhancement with natural sounding residual noise based on connected time–frequency speech presence regions. EURASIP Journal on Applied Signal Processing, 2005, 2954–2964.

    MATH  Google Scholar 

  30. Yang, X.-S. (2010). Firefly algorithm, Levy flights and global optimization. In X.-S. Yang (Ed.), Research and development in intelligent systems XXVI (pp. 209–218). London: Springer.

  31. Yang, X.-S., & He, X. (2013). Firefly algorithm: Recent advances and applications. arXiv preprint arXiv:1308.3898.

  32. Loizou, P. C., & Kim, G. (2011). Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 47–56.

    Article  Google Scholar 

  33. Loizou, P. (2017). NOIZEUS: A noisy speech corpus for evaluation of speech enhancement algorithms. Speech Communication, 49, 588–601.

    Google Scholar 

  34. Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2010). A short-time objective intelligibility measure for time–frequency weighted noisy speech. In 2010 IEEE International conference on acoustics, speech and signal processing (pp. 4214–4217).

  35. Dash, T. K., & Solanki, S. S. (2019). Investigation on the effect of the input features in the noise level classification of noisy speech. Journal of Scientific and Industrial Research (JSIR), 78(12), 868–872.

    Google Scholar 

  36. Rangachari, S., Loizou, P. C., & Hu, Y. (2004). A noise estimation algorithm with rapid adaptation for highly nonstationary environments. In 2004 IEEE International conference on acoustics, speech, and signal processing (Vol. 1, p. 305).

  37. Wan, E., Nelson, A., & Peterson, R. (2002). Speech enhancement assessment resource (SPEAR) database. CSLU, Oregon Graduate Institute of Science and Technology, Beta version Release v1. 0. Retrieved August, 2002 from http://ee.ogi.edu/NSEL.

  38. Pao, Y. H. (1989). Adaptive pattern recognition and neural networks (No. 04; TK7882. P3, P3).

  39. Majhi, R., Panda, G., & Sahoo, G. (2009). Development and performance evaluation of FLANN based model for forecasting of stock markets. Expert Systems with Applications, 36(3), 6800–6808.

    Article  Google Scholar 

  40. Islam, M. T., Shahnaz, C., Zhu, W.-P., & Ahmad, M. O. (2015). Speech enhancement based on student \(t\) modeling of teager energy operated perceptual wavelet packet coefficients and a custom thresholding function. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(11), 1800–1811.

    Article  Google Scholar 

  41. Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113–120.

    Article  Google Scholar 

  42. Lu, Y., & Loizou, P. C. (2008). A geometric approach to spectral subtraction. Speech Communication, 50(6), 453–466.

    Article  Google Scholar 

  43. Doire, C. S., Brookes, M., Naylor, P. A., Hicks, C. M., Betts, D., Dmour, M. A., et al. (2017). Single-channel online enhancement of speech corrupted by reverberation and noise. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(3), 572–587.

    Article  Google Scholar 

  44. Plapous, C., Marro, C., & Scalart, P. (2006). Improved signal-to-noise ratio estimation for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2098–2108.

    Article  Google Scholar 

  45. Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In Proceedings of 2001 IEEE international conference on acoustics, speech, and signal processing (Cat. No. 01CH37221) (Vol. 2, pp. 749–752).

  46. Ma, J., & Loizou, P. C. (2011). SNR loss: A new objective measure for predicting the intelligibility of noise-suppressed speech. Speech Communication, 53(3), 340–354.

    Article  Google Scholar 

  47. Mermelstein, P. (1979). Evaluation of a segmental SNR measure as an indicator of the quality of ADPCM coded speech. The Journal of the Acoustical Society of America, 66(6), 1664–1667.

    Article  Google Scholar 

  48. Wang, S., Sekey, A., & Gersho, A. (1992). An objective measure for predicting subjective quality of speech coders. IEEE Journal on Selected Areas in Communications, 10(5), 819–829.

    Article  Google Scholar 

  49. Klatt, D. (1982). Prediction of perceived phonetic distance from critical-band spectra: A first step. In ICASSP’82. IEEE International conference on acoustics, speech, and signal processing (Vol. 7, pp. 1278–1281).

  50. Hu, Y., & Loizou, P. C. (2006). Evaluation of objective measures for speech enhancement. In Ninth international conference on spoken language processing.

  51. Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.

    Article  Google Scholar 

  52. Barnwell, III, T. P. (1979). Objective measures for speech quality testing. Journal of the Acoustical Society of America, 66(6), 1658–1663.

    Article  Google Scholar 

  53. Vincent, E. (2005). MUSHRAM: A MATLAB interface for MUSHRA listening tests. Retrieved January 19, 2019, from http://www.elec.qmul.ac.uk/people/emmanuelv/mushram.

  54. Hirsch, H.-G., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ASR2000-Automatic speech recognition: Challenges for the new millenium ISCA tutorial and research workshop (ITRW).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tusar Kanti Dash.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dash, T.K., Solanki, S.S. & Panda, G. Improved phase aware speech enhancement using bio-inspired and ANN techniques. Analog Integr Circ Sig Process 102, 465–477 (2020). https://doi.org/10.1007/s10470-019-01566-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10470-019-01566-z

Keywords

Navigation