Abstract
This paper presents a method for multi-speaker voiced/unvoiced decision in the case of monaural speech. This approach is based on the multi-scale product (MP) analysis of a composite signal. It consists in calculating the distances between the maxima and the minima of our proposed MP analysis technique. Then, we analyze these distances in order to make the voicing decision for both speech signals forming the mixture. Experiments are performed using Cooke and Keele databases and some mixtures from the GRID database. The results show the robustness and effectiveness of our proposed approach.
Similar content being viewed by others
References
M. Algabri, M. Alsulaiman, G. Muhammad, M. Zakariah, M. Bencherif, Z. Ali, Voice and unvoiced classification using fuzzy logic, in International Conference on IP, Computer Vision, and Pattern Recognition, (IPCV, 2015)
R.G. Bachu, S. Kopparthi, B. Adapa, B.D. Barkana, Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal, in Advanced Techniques in Computing Sciences and Software Engineering, (Springer, 2009), pp. 279–282
M.A. Ben Messaoud, A. Bouzid, N. Ellouze, A new biologically inspired fuzzy expert system-based voiced/unvoiced decision algorithm for speech enhancement. Cogn. Comput. 8(3), 478–493 (2016)
M.A. Ben Messaoud, A. Bouzid, N. Ellouze, Estimation du Pitch et Décision de Voisement par Compression Spectrale de l’Autocorrélation du Produit Multi-échelle, in Actes de la conférence conjointe JEP-TALN-RECITAL, vol. 1 (2012) pp. 201–208
M.A. Ben Messaoud, A. Bouzid, N. Ellouze, Autocorrelation of the speech multi-scale product for voicing decision and pitch estimation. Cogn. Comput. 2(3), 151–159 (2010)
F. Beritelli, S. Casale, Robust voiced/unvoiced speech classification using fuzzy rules, in IEEE Workshop on Speech Coding For Telecommunications Proceeding (2013)
M.P. Cooke, J. Barker, An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 120(25), 2421–2424 (2006)
M.P. Cooke, J.R. Hershey, S.J. Rennie, Monaural speech separation and recognition challenge. Comput. Speech Lang. J. 24(1), 1–15 (2010)
N.F. Hassan, H. Bahjat Abdul Wahab, Proposed a new approach for voiced/unvoiced decision of speech file using lagrange technique. Telecommun. Radio Eng. 72(6), 495–504 (2013)
K. Kavita, A.Z., Yet another algorithm for pitch tracking, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (Orlando, 2002) pp. 13–17
K. Khaldi, A.O. Boudraa, M. Turki, Voiced/unvoiced speech classification-based adaptive filtering of decomposed empirical modes for speech enhancement. IET Signal Process. 10(1), 69–80 (2016)
Y. Kong, Your wavelet based pitch detection and voiced/unvoiced decision. Am. J. Eng. Technol. Res. 13(1), 27 (2013)
Y. Liu, D. Wang, Speaker-dependent multipitch tracking using deep neural networks. J. Acoust. Soc. Am. 141(2), 710 (2017)
L. Ming, C. Chuan, W. Di, L. Ping, F. Qiang, Y. Yonghong, Cochannel speech separation using multi-pitch estimation and model based voiced sequential grouping, in Proceedings of International Conference 9th Annual Conference of the International Speech Communication Association (INTERSPEECH), (Brisbane, 2008), pp. 151–154
F. Plante, G.F. Meyer, W.A. Ainsworth, A pitch extraction reference database, in ESCA EUROSPEECH’95 4th European Conference on Speech Communication and Technology, Madrid, ISSN 1018-4074, pp: 837–840 (1995)
A. Rosenfeld, Non-linear edge detection. Proc. IEEE 58, 814–816 (1970)
V. Srikanth, E.W. Carol, An algorithm for multi-pitch tracking in co-channel speech, in 9th Annual Conference of the International Speech Communication Association (INTERSPEECH), (Brisbane, 2008)
S.B. Sunil Kumar, K. Sreenivasa Rao, Voice/non-voice detection using phase of zero frequency filtered speech signal. Speech Commun. 81, 90–103 (2016)
M.R.P. Thomas, J. Gudnason, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Trans. Audio Speech Lang. Process. 20(1), 82–91 (2012)
A. Upadhyay, R.B. Pachori, Instantaneous voiced/non-voiced detection in speech signals based on variational mode decomposition. J. Frankl. Inst. 352(7), 2679–2707 (2015)
A. Vinayak, S. Pulkit, S. Anil Kumar, Voiced/nonvoiced detection in compressively sensed speech signal. Speech Commun. 72, 194–207 (2015)
A. Waghela, R. Reddy, S. Rai, A. Pawar, N. Gharat, SUV detection algorithm for speech signals. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4, 958 (2014)
M. Wasserblat, M. Gainza, D. Dorran, Y. Domb, Pitch tracking and voiced/unvoiced detection in noisy environment using optimal sequence estimation, in Signals and Systems Conference (ISSC), (IET Irish Galway 2008)
B.F. Wu, K.C. Wang, Voice activity detection based on auto-correlation function using wavelet transform and teager energy operator. Comput. Linguist. Chin. Lang. Process. 11(1), 87–100 (2006)
J. Zeremdini, M.A. Ben Messaoud, A. Bouzid, N. Ellouze, Contribution to the multi-pitch estimation by multi-scale product analysis, in NOLISP 2013, (Mons, 2013)
J. Zeremdini, M.A. Ben Messaoud, A. Bouzid, Multiple comb filters and autocorrelation of the multi-scale product for multi-pitch estimation. Appl. Acoust. 120, 45–53 (2017)
J. Zeremdini, M.A. Ben Messaoud, A. Bouzid, Multi-pitch estimation based on multi-scale product analysis, improved comb filter and dynamic programming. Int. J. Speech Technol. 20, 1–13 (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zeremdini, J., Ben Messaoud, M.A. & Bouzid, A. Two-speaker Voiced/Unvoiced Decision for Monaural Speech. Circuits Syst Signal Process 39, 4399–4415 (2020). https://doi.org/10.1007/s00034-020-01373-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-020-01373-2