Skip to main content
Log in

Two-speaker Voiced/Unvoiced Decision for Monaural Speech

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

This paper presents a method for multi-speaker voiced/unvoiced decision in the case of monaural speech. This approach is based on the multi-scale product (MP) analysis of a composite signal. It consists in calculating the distances between the maxima and the minima of our proposed MP analysis technique. Then, we analyze these distances in order to make the voicing decision for both speech signals forming the mixture. Experiments are performed using Cooke and Keele databases and some mixtures from the GRID database. The results show the robustness and effectiveness of our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. M. Algabri, M. Alsulaiman, G. Muhammad, M. Zakariah, M. Bencherif, Z. Ali, Voice and unvoiced classification using fuzzy logic, in International Conference on IP, Computer Vision, and Pattern Recognition, (IPCV, 2015)

  2. R.G. Bachu, S. Kopparthi, B. Adapa, B.D. Barkana, Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal, in Advanced Techniques in Computing Sciences and Software Engineering, (Springer, 2009), pp. 279–282

  3. M.A. Ben Messaoud, A. Bouzid, N. Ellouze, A new biologically inspired fuzzy expert system-based voiced/unvoiced decision algorithm for speech enhancement. Cogn. Comput. 8(3), 478–493 (2016)

    Article  Google Scholar 

  4. M.A. Ben Messaoud, A. Bouzid, N. Ellouze, Estimation du Pitch et Décision de Voisement par Compression Spectrale de l’Autocorrélation du Produit Multi-échelle, in Actes de la conférence conjointe JEP-TALN-RECITAL, vol. 1 (2012) pp. 201–208

  5. M.A. Ben Messaoud, A. Bouzid, N. Ellouze, Autocorrelation of the speech multi-scale product for voicing decision and pitch estimation. Cogn. Comput. 2(3), 151–159 (2010)

    Article  Google Scholar 

  6. F. Beritelli, S. Casale, Robust voiced/unvoiced speech classification using fuzzy rules, in IEEE Workshop on Speech Coding For Telecommunications Proceeding (2013)

  7. M.P. Cooke, J. Barker, An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 120(25), 2421–2424 (2006)

    Article  Google Scholar 

  8. M.P. Cooke, J.R. Hershey, S.J. Rennie, Monaural speech separation and recognition challenge. Comput. Speech Lang. J. 24(1), 1–15 (2010)

    Article  Google Scholar 

  9. N.F. Hassan, H. Bahjat Abdul Wahab, Proposed a new approach for voiced/unvoiced decision of speech file using lagrange technique. Telecommun. Radio Eng. 72(6), 495–504 (2013)

    Article  Google Scholar 

  10. K. Kavita, A.Z., Yet another algorithm for pitch tracking, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (Orlando, 2002) pp. 13–17

  11. K. Khaldi, A.O. Boudraa, M. Turki, Voiced/unvoiced speech classification-based adaptive filtering of decomposed empirical modes for speech enhancement. IET Signal Process. 10(1), 69–80 (2016)

    Article  Google Scholar 

  12. Y. Kong, Your wavelet based pitch detection and voiced/unvoiced decision. Am. J. Eng. Technol. Res. 13(1), 27 (2013)

    Google Scholar 

  13. Y. Liu, D. Wang, Speaker-dependent multipitch tracking using deep neural networks. J. Acoust. Soc. Am. 141(2), 710 (2017)

    Article  Google Scholar 

  14. L. Ming, C. Chuan, W. Di, L. Ping, F. Qiang, Y. Yonghong, Cochannel speech separation using multi-pitch estimation and model based voiced sequential grouping, in Proceedings of International Conference 9th Annual Conference of the International Speech Communication Association (INTERSPEECH), (Brisbane, 2008), pp. 151–154

  15. F. Plante, G.F. Meyer, W.A. Ainsworth, A pitch extraction reference database, in ESCA EUROSPEECH’95 4th European Conference on Speech Communication and Technology, Madrid, ISSN 1018-4074, pp: 837–840 (1995)

  16. A. Rosenfeld, Non-linear edge detection. Proc. IEEE 58, 814–816 (1970)

    Article  Google Scholar 

  17. V. Srikanth, E.W. Carol, An algorithm for multi-pitch tracking in co-channel speech, in 9th Annual Conference of the International Speech Communication Association (INTERSPEECH), (Brisbane, 2008)

  18. S.B. Sunil Kumar, K. Sreenivasa Rao, Voice/non-voice detection using phase of zero frequency filtered speech signal. Speech Commun. 81, 90–103 (2016)

    Article  Google Scholar 

  19. M.R.P. Thomas, J. Gudnason, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Trans. Audio Speech Lang. Process. 20(1), 82–91 (2012)

    Article  Google Scholar 

  20. A. Upadhyay, R.B. Pachori, Instantaneous voiced/non-voiced detection in speech signals based on variational mode decomposition. J. Frankl. Inst. 352(7), 2679–2707 (2015)

    Article  MATH  Google Scholar 

  21. A. Vinayak, S. Pulkit, S. Anil Kumar, Voiced/nonvoiced detection in compressively sensed speech signal. Speech Commun. 72, 194–207 (2015)

    Article  Google Scholar 

  22. A. Waghela, R. Reddy, S. Rai, A. Pawar, N. Gharat, SUV detection algorithm for speech signals. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4, 958 (2014)

    Google Scholar 

  23. M. Wasserblat, M. Gainza, D. Dorran, Y. Domb, Pitch tracking and voiced/unvoiced detection in noisy environment using optimal sequence estimation, in Signals and Systems Conference (ISSC), (IET Irish Galway 2008)

  24. B.F. Wu, K.C. Wang, Voice activity detection based on auto-correlation function using wavelet transform and teager energy operator. Comput. Linguist. Chin. Lang. Process. 11(1), 87–100 (2006)

    Google Scholar 

  25. J. Zeremdini, M.A. Ben Messaoud, A. Bouzid, N. Ellouze, Contribution to the multi-pitch estimation by multi-scale product analysis, in NOLISP 2013, (Mons, 2013)

  26. J. Zeremdini, M.A. Ben Messaoud, A. Bouzid, Multiple comb filters and autocorrelation of the multi-scale product for multi-pitch estimation. Appl. Acoust. 120, 45–53 (2017)

    Article  Google Scholar 

  27. J. Zeremdini, M.A. Ben Messaoud, A. Bouzid, Multi-pitch estimation based on multi-scale product analysis, improved comb filter and dynamic programming. Int. J. Speech Technol. 20, 1–13 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jihen Zeremdini.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zeremdini, J., Ben Messaoud, M.A. & Bouzid, A. Two-speaker Voiced/Unvoiced Decision for Monaural Speech. Circuits Syst Signal Process 39, 4399–4415 (2020). https://doi.org/10.1007/s00034-020-01373-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-020-01373-2

Keywords

Navigation