Abstract
In this paper, we propose a novel approach for accurate detection of vowel onset points (VOPs). A VOP is the instant at which a vowel begins in a speech signal. Precise identification of VOPs is important for various speech applications such as speech segmentation and speech rate modification. Existing methods detect the majority of VOPs to an accuracy of 40 ms deviation, which may not be appropriate for the above speech applications. To address this issue, we proposed a two-stage approach for accurate detection of VOPs. At the first stage, VOPs are detected using continuous wavelet transform coefficients, and the position of the detected VOPs are corrected using phone boundaries in the second stage. The phone boundaries are detected by the spectral transition measure method. Experiments are done using TIMIT and Bengali speech corpora. Performance of the proposed approach is compared with two standard signal processing based methods as well as with a recent VOP detection technique. The evaluation results show that the proposed method performs better than the existing methods.
Similar content being viewed by others
References
Batliner A, Kompe R, Kießling A, Nöth E, Niemann H (1995) Can you tell apart spontaneous and read speech if you just look at prosody? In: Speech recognition and coding, Springer, pp 321–324
Blaauw E (1991) Phonetic characteristics of spontaneous and read-aloud speech. In: Phonetics and phonology of speaking styles
Deb S, Dandapat S (2017) Emotion classification using segmentation of vowel-like and non-vowel-like regions. IEEE Trans Affect Comput
Dellwo V, Leemann A, Kolly MJ (2015) The recognition of read and spontaneous speech in local vernacular: the case of zurich german. J Phone 48:13–28
Dusan S, Rabiner L (2006) On the relation between maximum spectral transition positions and phone boundaries. In: Proceedings of INTERSPEECH, p 1317–1320
Furui S (1986) On the role of spectral transition for speech perception. J Acoust Soc Am 80(4):1016–1025
Furui S (2003) Recent advances in spontaneous speech recognition and understanding. In: Proceedings of ISCA & IEEE workshop on spontaneous speech processing and recognition
Gangashetty SV, Sekhar CC, Yegnanarayana B (2004a) Detection of vowel onset points in continuous speech using autoassociative neural network models. In: Proceedings of INTERSPEECH, pp 1081–1084
Gangashetty SV, Sekhar CC, Yegnanarayana B (2004b) Extraction of fixed dimension patterns from varying duration segments of consonant-vowel utterances. In: Proceedings of international conference on intelligent sensing and information processing, IEEE, pp 159–164
Gangashetty SV, Sekhar CC, Yegnanarayana B (2005) Spotting multilingual consonant-vowel units of speech using neural network models. In: International conference on nonlinear analyses and algorithms for speech processing, Springer, pp 303–317
Garofalo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL (1993) The DARPA TIMIT acoustic-phonetic continuous speech corpus cdrom. Linguistic Data Consortium
Hermes DJ (1990) Vowel-onset detection. J Acoust Soc Am 87(2):866–873
Khonglah BK, Sarma BD, Prasanna S (2014) Exploration of deep belief networks for vowel-like regions detection. In: Proceedings of annual IEEE India conference (INDICON), IEEE, pp 1–5
Kumar A, Shahnawazuddin S, Pradhan G (2016) Exploring different acoustic modeling techniques for the detection of vowels in speech signal. In: Proceedings of twenty second national conference on communication (NCC), IEEE, pp 1–5
Kumar A, Shahnawazuddin S, Pradhan G (2017) Improvements in the detection of vowel onset and offset points in a speech sequence. Circ Syst Signal Process 36(6):2315–2340
Kumar A, Pradhan G, Shahnawazuddin S (2019) An adaptive method for robust detection of vowels in noisy environment. Circ Syst Signal Proces 38(9):4180–4201
Kumar SS, Rao KS, Pati D (2013) Phonetic and prosodically rich transcribed speech corpus in Indian languages: Bengali and Odia. In: Proceedings of International Conference on Oriental COCOSDA held jointly with Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), Gurgaon, India, pp 1–5
Madhavi M, Patil H, Vachhani BB (2015) Spectral transition measure for detection of obstruents. In: Proceedings of 23rd European Signal Processing Conference (EUSIPCO), IEEE, pp 330–334
Nakamura M, Iwano K, Furui S (2008) Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance. Comput Speech Lang 22(2):171–184
Prasanna SM, Pradhan G (2011) Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Trans Audio Speech Lang Process 19(8):2552–2565
Prasanna SM, Yegnanarayana B (2005) Detection of vowel onset point events using excitation information. In: Proceedings of INTERSPEECH, pp 1133–1136
Prasanna SM, Gangashetty SV, Yegnanarayana B (2001) Significance of vowel onset point for speech analysis. In: Proceedings of International Conference on signal processing and communications, Citeseer, pp 81–88
Prasanna SM, Reddy BS, Krishnamoorthy P (2009) Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans Audio Speech Lang Process 17(4):556–565
Rao KS, Yegnanarayana B (2009) Duration modification using glottal closure instants and vowel onset points. Speech Commun 51(12):1263–1269
Reddy MK, Rao KS (2017) Robust pitch extraction method for the hmm-based speech synthesis system. IEEE Signal Process Lett 24(8):1133–1137
Sarma BD, Prasanna SM (2013) Analysis of spurious vowel-like regions (vlrs) detected by excitation source information. In: Proceedings of Annual IEEE India Conference (INDICON), IEEE, pp 1–5
Sarma BD, Prasanna SM, Sarmah P (2017) Consonant-vowel unit recognition using dominant aperiodic and transition region detection. Speech Commun 92:77–89
Stephane M (1999) A wavelet tour of signal processing. The Sparse Way
Vuppala AK, Rao KS (2013) Vowel onset point detection for noisy speech using spectral energy at formant frequencies. Int J Speech Technol 16(2):229–235
Vuppala AK, Rao KS, Chakrabarti S (2012a) Improved vowel onset point detection using epoch intervals. AEU-Int J Electron Commun 66(8):697–700
Vuppala AK, Yadav J, Chakrabarti S, Rao KS (2012b) Vowel onset point detection for low bit rate coded speech. IEEE Trans Audio Speech Lang Process 20(6):1894–1903
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tripathi, K., Rao, K.S. VOP detection for read and conversation speech using CWT coefficients and phone boundaries. J Ambient Intell Human Comput 13, 105–116 (2022). https://doi.org/10.1007/s12652-020-02890-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-020-02890-3