Skip to main content
Log in

VOP detection for read and conversation speech using CWT coefficients and phone boundaries

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

In this paper, we propose a novel approach for accurate detection of vowel onset points (VOPs). A VOP is the instant at which a vowel begins in a speech signal. Precise identification of VOPs is important for various speech applications such as speech segmentation and speech rate modification. Existing methods detect the majority of VOPs to an accuracy of 40 ms deviation, which may not be appropriate for the above speech applications. To address this issue, we proposed a two-stage approach for accurate detection of VOPs. At the first stage, VOPs are detected using continuous wavelet transform coefficients, and the position of the detected VOPs are corrected using phone boundaries in the second stage. The phone boundaries are detected by the spectral transition measure method. Experiments are done using TIMIT and Bengali speech corpora. Performance of the proposed approach is compared with two standard signal processing based methods as well as with a recent VOP detection technique. The evaluation results show that the proposed method performs better than the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Batliner A, Kompe R, Kießling A, Nöth E, Niemann H (1995) Can you tell apart spontaneous and read speech if you just look at prosody? In: Speech recognition and coding, Springer, pp 321–324

  • Blaauw E (1991) Phonetic characteristics of spontaneous and read-aloud speech. In: Phonetics and phonology of speaking styles

  • Deb S, Dandapat S (2017) Emotion classification using segmentation of vowel-like and non-vowel-like regions. IEEE Trans Affect Comput

  • Dellwo V, Leemann A, Kolly MJ (2015) The recognition of read and spontaneous speech in local vernacular: the case of zurich german. J Phone 48:13–28

    Article  Google Scholar 

  • Dusan S, Rabiner L (2006) On the relation between maximum spectral transition positions and phone boundaries. In: Proceedings of INTERSPEECH, p 1317–1320

  • Furui S (1986) On the role of spectral transition for speech perception. J Acoust Soc Am 80(4):1016–1025

    Article  Google Scholar 

  • Furui S (2003) Recent advances in spontaneous speech recognition and understanding. In: Proceedings of ISCA & IEEE workshop on spontaneous speech processing and recognition

  • Gangashetty SV, Sekhar CC, Yegnanarayana B (2004a) Detection of vowel onset points in continuous speech using autoassociative neural network models. In: Proceedings of INTERSPEECH, pp 1081–1084

  • Gangashetty SV, Sekhar CC, Yegnanarayana B (2004b) Extraction of fixed dimension patterns from varying duration segments of consonant-vowel utterances. In: Proceedings of international conference on intelligent sensing and information processing, IEEE, pp 159–164

  • Gangashetty SV, Sekhar CC, Yegnanarayana B (2005) Spotting multilingual consonant-vowel units of speech using neural network models. In: International conference on nonlinear analyses and algorithms for speech processing, Springer, pp 303–317

  • Garofalo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL (1993) The DARPA TIMIT acoustic-phonetic continuous speech corpus cdrom. Linguistic Data Consortium

  • Hermes DJ (1990) Vowel-onset detection. J Acoust Soc Am 87(2):866–873

    Article  Google Scholar 

  • Khonglah BK, Sarma BD, Prasanna S (2014) Exploration of deep belief networks for vowel-like regions detection. In: Proceedings of annual IEEE India conference (INDICON), IEEE, pp 1–5

  • Kumar A, Shahnawazuddin S, Pradhan G (2016) Exploring different acoustic modeling techniques for the detection of vowels in speech signal. In: Proceedings of twenty second national conference on communication (NCC), IEEE, pp 1–5

  • Kumar A, Shahnawazuddin S, Pradhan G (2017) Improvements in the detection of vowel onset and offset points in a speech sequence. Circ Syst Signal Process 36(6):2315–2340

    Article  MathSciNet  Google Scholar 

  • Kumar A, Pradhan G, Shahnawazuddin S (2019) An adaptive method for robust detection of vowels in noisy environment. Circ Syst Signal Proces 38(9):4180–4201

    Article  Google Scholar 

  • Kumar SS, Rao KS, Pati D (2013) Phonetic and prosodically rich transcribed speech corpus in Indian languages: Bengali and Odia. In: Proceedings of International Conference on Oriental COCOSDA held jointly with Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), Gurgaon, India, pp 1–5

  • Madhavi M, Patil H, Vachhani BB (2015) Spectral transition measure for detection of obstruents. In: Proceedings of 23rd European Signal Processing Conference (EUSIPCO), IEEE, pp 330–334

  • Nakamura M, Iwano K, Furui S (2008) Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance. Comput Speech Lang 22(2):171–184

    Article  Google Scholar 

  • Prasanna SM, Pradhan G (2011) Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Trans Audio Speech Lang Process 19(8):2552–2565

    Article  Google Scholar 

  • Prasanna SM, Yegnanarayana B (2005) Detection of vowel onset point events using excitation information. In: Proceedings of INTERSPEECH, pp 1133–1136

  • Prasanna SM, Gangashetty SV, Yegnanarayana B (2001) Significance of vowel onset point for speech analysis. In: Proceedings of International Conference on signal processing and communications, Citeseer, pp 81–88

  • Prasanna SM, Reddy BS, Krishnamoorthy P (2009) Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans Audio Speech Lang Process 17(4):556–565

    Article  Google Scholar 

  • Rao KS, Yegnanarayana B (2009) Duration modification using glottal closure instants and vowel onset points. Speech Commun 51(12):1263–1269

    Article  Google Scholar 

  • Reddy MK, Rao KS (2017) Robust pitch extraction method for the hmm-based speech synthesis system. IEEE Signal Process Lett 24(8):1133–1137

    Article  Google Scholar 

  • Sarma BD, Prasanna SM (2013) Analysis of spurious vowel-like regions (vlrs) detected by excitation source information. In: Proceedings of Annual IEEE India Conference (INDICON), IEEE, pp 1–5

  • Sarma BD, Prasanna SM, Sarmah P (2017) Consonant-vowel unit recognition using dominant aperiodic and transition region detection. Speech Commun 92:77–89

    Article  Google Scholar 

  • Stephane M (1999) A wavelet tour of signal processing. The Sparse Way

  • Vuppala AK, Rao KS (2013) Vowel onset point detection for noisy speech using spectral energy at formant frequencies. Int J Speech Technol 16(2):229–235

    Article  Google Scholar 

  • Vuppala AK, Rao KS, Chakrabarti S (2012a) Improved vowel onset point detection using epoch intervals. AEU-Int J Electron Commun 66(8):697–700

    Article  Google Scholar 

  • Vuppala AK, Yadav J, Chakrabarti S, Rao KS (2012b) Vowel onset point detection for low bit rate coded speech. IEEE Trans Audio Speech Lang Process 20(6):1894–1903

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kumud Tripathi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tripathi, K., Rao, K.S. VOP detection for read and conversation speech using CWT coefficients and phone boundaries. J Ambient Intell Human Comput 13, 105–116 (2022). https://doi.org/10.1007/s12652-020-02890-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-02890-3

Keywords

Navigation