Two-speaker Voiced/Unvoiced Decision for Monaural Speech

Zeremdini, Jihen; Ben Messaoud, Mohamed Anouar; Bouzid, Aicha

doi:10.1007/s00034-020-01373-2

Two-speaker Voiced/Unvoiced Decision for Monaural Speech

Published: 19 February 2020

Volume 39, pages 4399–4415, (2020)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Jihen Zeremdini ORCID: orcid.org/0000-0001-5030-4430¹,
Mohamed Anouar Ben Messaoud¹ &
Aicha Bouzid¹

168 Accesses
1 Citation
Explore all metrics

Abstract

This paper presents a method for multi-speaker voiced/unvoiced decision in the case of monaural speech. This approach is based on the multi-scale product (MP) analysis of a composite signal. It consists in calculating the distances between the maxima and the minima of our proposed MP analysis technique. Then, we analyze these distances in order to make the voicing decision for both speech signals forming the mixture. Experiments are performed using Cooke and Keele databases and some mixtures from the GRID database. The results show the robustness and effectiveness of our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised speech separation by detecting speaker changeover points under single channel condition

Article 03 August 2021

Combining Evidences from Mel Cepstral and Cochlear Cepstral Features for Speaker Recognition Using Whispered Speech

Linear Versus Nonlinear Multi-scale Decomposition for Co-channel Speaker Identification System

References

M. Algabri, M. Alsulaiman, G. Muhammad, M. Zakariah, M. Bencherif, Z. Ali, Voice and unvoiced classification using fuzzy logic, in International Conference on IP, Computer Vision, and Pattern Recognition, (IPCV, 2015)
R.G. Bachu, S. Kopparthi, B. Adapa, B.D. Barkana, Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal, in Advanced Techniques in Computing Sciences and Software Engineering, (Springer, 2009), pp. 279–282
M.A. Ben Messaoud, A. Bouzid, N. Ellouze, A new biologically inspired fuzzy expert system-based voiced/unvoiced decision algorithm for speech enhancement. Cogn. Comput. 8(3), 478–493 (2016)
Article Google Scholar
M.A. Ben Messaoud, A. Bouzid, N. Ellouze, Estimation du Pitch et Décision de Voisement par Compression Spectrale de l’Autocorrélation du Produit Multi-échelle, in Actes de la conférence conjointe JEP-TALN-RECITAL, vol. 1 (2012) pp. 201–208
M.A. Ben Messaoud, A. Bouzid, N. Ellouze, Autocorrelation of the speech multi-scale product for voicing decision and pitch estimation. Cogn. Comput. 2(3), 151–159 (2010)
Article Google Scholar
F. Beritelli, S. Casale, Robust voiced/unvoiced speech classification using fuzzy rules, in IEEE Workshop on Speech Coding For Telecommunications Proceeding (2013)
M.P. Cooke, J. Barker, An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 120(25), 2421–2424 (2006)
Article Google Scholar
M.P. Cooke, J.R. Hershey, S.J. Rennie, Monaural speech separation and recognition challenge. Comput. Speech Lang. J. 24(1), 1–15 (2010)
Article Google Scholar
N.F. Hassan, H. Bahjat Abdul Wahab, Proposed a new approach for voiced/unvoiced decision of speech file using lagrange technique. Telecommun. Radio Eng. 72(6), 495–504 (2013)
Article Google Scholar
K. Kavita, A.Z., Yet another algorithm for pitch tracking, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (Orlando, 2002) pp. 13–17
K. Khaldi, A.O. Boudraa, M. Turki, Voiced/unvoiced speech classification-based adaptive filtering of decomposed empirical modes for speech enhancement. IET Signal Process. 10(1), 69–80 (2016)
Article Google Scholar
Y. Kong, Your wavelet based pitch detection and voiced/unvoiced decision. Am. J. Eng. Technol. Res. 13(1), 27 (2013)
Google Scholar
Y. Liu, D. Wang, Speaker-dependent multipitch tracking using deep neural networks. J. Acoust. Soc. Am. 141(2), 710 (2017)
Article Google Scholar
L. Ming, C. Chuan, W. Di, L. Ping, F. Qiang, Y. Yonghong, Cochannel speech separation using multi-pitch estimation and model based voiced sequential grouping, in Proceedings of International Conference 9th Annual Conference of the International Speech Communication Association (INTERSPEECH), (Brisbane, 2008), pp. 151–154
F. Plante, G.F. Meyer, W.A. Ainsworth, A pitch extraction reference database, in ESCA EUROSPEECH’95 4th European Conference on Speech Communication and Technology, Madrid, ISSN 1018-4074, pp: 837–840 (1995)
A. Rosenfeld, Non-linear edge detection. Proc. IEEE 58, 814–816 (1970)
Article Google Scholar
V. Srikanth, E.W. Carol, An algorithm for multi-pitch tracking in co-channel speech, in 9th Annual Conference of the International Speech Communication Association (INTERSPEECH), (Brisbane, 2008)
S.B. Sunil Kumar, K. Sreenivasa Rao, Voice/non-voice detection using phase of zero frequency filtered speech signal. Speech Commun. 81, 90–103 (2016)
Article Google Scholar
M.R.P. Thomas, J. Gudnason, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Trans. Audio Speech Lang. Process. 20(1), 82–91 (2012)
Article Google Scholar
A. Upadhyay, R.B. Pachori, Instantaneous voiced/non-voiced detection in speech signals based on variational mode decomposition. J. Frankl. Inst. 352(7), 2679–2707 (2015)
Article MATH Google Scholar
A. Vinayak, S. Pulkit, S. Anil Kumar, Voiced/nonvoiced detection in compressively sensed speech signal. Speech Commun. 72, 194–207 (2015)
Article Google Scholar
A. Waghela, R. Reddy, S. Rai, A. Pawar, N. Gharat, SUV detection algorithm for speech signals. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4, 958 (2014)
Google Scholar
M. Wasserblat, M. Gainza, D. Dorran, Y. Domb, Pitch tracking and voiced/unvoiced detection in noisy environment using optimal sequence estimation, in Signals and Systems Conference (ISSC), (IET Irish Galway 2008)
B.F. Wu, K.C. Wang, Voice activity detection based on auto-correlation function using wavelet transform and teager energy operator. Comput. Linguist. Chin. Lang. Process. 11(1), 87–100 (2006)
Google Scholar
J. Zeremdini, M.A. Ben Messaoud, A. Bouzid, N. Ellouze, Contribution to the multi-pitch estimation by multi-scale product analysis, in NOLISP 2013, (Mons, 2013)
J. Zeremdini, M.A. Ben Messaoud, A. Bouzid, Multiple comb filters and autocorrelation of the multi-scale product for multi-pitch estimation. Appl. Acoust. 120, 45–53 (2017)
Article Google Scholar
J. Zeremdini, M.A. Ben Messaoud, A. Bouzid, Multi-pitch estimation based on multi-scale product analysis, improved comb filter and dynamic programming. Int. J. Speech Technol. 20, 1–13 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Signal, Image and Information Technology Laboratory, LR11ES17, University of Tunis El Manar, 1002, Tunis, Tunisia
Jihen Zeremdini, Mohamed Anouar Ben Messaoud & Aicha Bouzid

Authors

Jihen Zeremdini
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Anouar Ben Messaoud
View author publications
You can also search for this author in PubMed Google Scholar
Aicha Bouzid
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jihen Zeremdini.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zeremdini, J., Ben Messaoud, M.A. & Bouzid, A. Two-speaker Voiced/Unvoiced Decision for Monaural Speech. Circuits Syst Signal Process 39, 4399–4415 (2020). https://doi.org/10.1007/s00034-020-01373-2

Download citation

Received: 19 May 2018
Revised: 09 February 2020
Accepted: 11 February 2020
Published: 19 February 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s00034-020-01373-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-speaker Voiced/Unvoiced Decision for Monaural Speech

Abstract

Access this article

Similar content being viewed by others

Unsupervised speech separation by detecting speaker changeover points under single channel condition

Combining Evidences from Mel Cepstral and Cochlear Cepstral Features for Speaker Recognition Using Whispered Speech

Linear Versus Nonlinear Multi-scale Decomposition for Co-channel Speaker Identification System

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Two-speaker Voiced/Unvoiced Decision for Monaural Speech

Abstract

Access this article

Similar content being viewed by others

Unsupervised speech separation by detecting speaker changeover points under single channel condition

Combining Evidences from Mel Cepstral and Cochlear Cepstral Features for Speaker Recognition Using Whispered Speech

Linear Versus Nonlinear Multi-scale Decomposition for Co-channel Speaker Identification System

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation