VOP detection for read and conversation speech using CWT coefficients and phone boundaries

Tripathi, Kumud; Rao, K. Sreenivasa

doi:10.1007/s12652-020-02890-3

VOP detection for read and conversation speech using CWT coefficients and phone boundaries

Original Research
Published: 07 January 2021

Volume 13, pages 105–116, (2022)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

194 Accesses
Explore all metrics

Abstract

In this paper, we propose a novel approach for accurate detection of vowel onset points (VOPs). A VOP is the instant at which a vowel begins in a speech signal. Precise identification of VOPs is important for various speech applications such as speech segmentation and speech rate modification. Existing methods detect the majority of VOPs to an accuracy of 40 ms deviation, which may not be appropriate for the above speech applications. To address this issue, we proposed a two-stage approach for accurate detection of VOPs. At the first stage, VOPs are detected using continuous wavelet transform coefficients, and the position of the detected VOPs are corrected using phone boundaries in the second stage. The phone boundaries are detected by the spectral transition measure method. Experiments are done using TIMIT and Bengali speech corpora. Performance of the proposed approach is compared with two standard signal processing based methods as well as with a recent VOP detection technique. The evaluation results show that the proposed method performs better than the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Robust analysis for improvement of vowel onset point detection under noisy conditions

Article 04 March 2016

Partha Saha, Ujwala Baruah, … Tushar Kanti Das

Improvements in the Detection of Vowel Onset and Offset Points in a Speech Sequence

Article 08 September 2016

Avinash Kumar, S. Shahnawazuddin & Gayadhar Pradhan

Application of Zero-Frequency Filtering for Vowel Onset Point Detection

References

Batliner A, Kompe R, Kießling A, Nöth E, Niemann H (1995) Can you tell apart spontaneous and read speech if you just look at prosody? In: Speech recognition and coding, Springer, pp 321–324
Blaauw E (1991) Phonetic characteristics of spontaneous and read-aloud speech. In: Phonetics and phonology of speaking styles
Deb S, Dandapat S (2017) Emotion classification using segmentation of vowel-like and non-vowel-like regions. IEEE Trans Affect Comput
Dellwo V, Leemann A, Kolly MJ (2015) The recognition of read and spontaneous speech in local vernacular: the case of zurich german. J Phone 48:13–28
Article Google Scholar
Dusan S, Rabiner L (2006) On the relation between maximum spectral transition positions and phone boundaries. In: Proceedings of INTERSPEECH, p 1317–1320
Furui S (1986) On the role of spectral transition for speech perception. J Acoust Soc Am 80(4):1016–1025
Article Google Scholar
Furui S (2003) Recent advances in spontaneous speech recognition and understanding. In: Proceedings of ISCA & IEEE workshop on spontaneous speech processing and recognition
Gangashetty SV, Sekhar CC, Yegnanarayana B (2004a) Detection of vowel onset points in continuous speech using autoassociative neural network models. In: Proceedings of INTERSPEECH, pp 1081–1084
Gangashetty SV, Sekhar CC, Yegnanarayana B (2004b) Extraction of fixed dimension patterns from varying duration segments of consonant-vowel utterances. In: Proceedings of international conference on intelligent sensing and information processing, IEEE, pp 159–164
Gangashetty SV, Sekhar CC, Yegnanarayana B (2005) Spotting multilingual consonant-vowel units of speech using neural network models. In: International conference on nonlinear analyses and algorithms for speech processing, Springer, pp 303–317
Garofalo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL (1993) The DARPA TIMIT acoustic-phonetic continuous speech corpus cdrom. Linguistic Data Consortium
Hermes DJ (1990) Vowel-onset detection. J Acoust Soc Am 87(2):866–873
Article Google Scholar
Khonglah BK, Sarma BD, Prasanna S (2014) Exploration of deep belief networks for vowel-like regions detection. In: Proceedings of annual IEEE India conference (INDICON), IEEE, pp 1–5
Kumar A, Shahnawazuddin S, Pradhan G (2016) Exploring different acoustic modeling techniques for the detection of vowels in speech signal. In: Proceedings of twenty second national conference on communication (NCC), IEEE, pp 1–5
Kumar A, Shahnawazuddin S, Pradhan G (2017) Improvements in the detection of vowel onset and offset points in a speech sequence. Circ Syst Signal Process 36(6):2315–2340
Article MathSciNet Google Scholar
Kumar A, Pradhan G, Shahnawazuddin S (2019) An adaptive method for robust detection of vowels in noisy environment. Circ Syst Signal Proces 38(9):4180–4201
Article Google Scholar
Kumar SS, Rao KS, Pati D (2013) Phonetic and prosodically rich transcribed speech corpus in Indian languages: Bengali and Odia. In: Proceedings of International Conference on Oriental COCOSDA held jointly with Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), Gurgaon, India, pp 1–5
Madhavi M, Patil H, Vachhani BB (2015) Spectral transition measure for detection of obstruents. In: Proceedings of 23rd European Signal Processing Conference (EUSIPCO), IEEE, pp 330–334
Nakamura M, Iwano K, Furui S (2008) Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance. Comput Speech Lang 22(2):171–184
Article Google Scholar
Prasanna SM, Pradhan G (2011) Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Trans Audio Speech Lang Process 19(8):2552–2565
Article Google Scholar
Prasanna SM, Yegnanarayana B (2005) Detection of vowel onset point events using excitation information. In: Proceedings of INTERSPEECH, pp 1133–1136
Prasanna SM, Gangashetty SV, Yegnanarayana B (2001) Significance of vowel onset point for speech analysis. In: Proceedings of International Conference on signal processing and communications, Citeseer, pp 81–88
Prasanna SM, Reddy BS, Krishnamoorthy P (2009) Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans Audio Speech Lang Process 17(4):556–565
Article Google Scholar
Rao KS, Yegnanarayana B (2009) Duration modification using glottal closure instants and vowel onset points. Speech Commun 51(12):1263–1269
Article Google Scholar
Reddy MK, Rao KS (2017) Robust pitch extraction method for the hmm-based speech synthesis system. IEEE Signal Process Lett 24(8):1133–1137
Article Google Scholar
Sarma BD, Prasanna SM (2013) Analysis of spurious vowel-like regions (vlrs) detected by excitation source information. In: Proceedings of Annual IEEE India Conference (INDICON), IEEE, pp 1–5
Sarma BD, Prasanna SM, Sarmah P (2017) Consonant-vowel unit recognition using dominant aperiodic and transition region detection. Speech Commun 92:77–89
Article Google Scholar
Stephane M (1999) A wavelet tour of signal processing. The Sparse Way
Vuppala AK, Rao KS (2013) Vowel onset point detection for noisy speech using spectral energy at formant frequencies. Int J Speech Technol 16(2):229–235
Article Google Scholar
Vuppala AK, Rao KS, Chakrabarti S (2012a) Improved vowel onset point detection using epoch intervals. AEU-Int J Electron Commun 66(8):697–700
Article Google Scholar
Vuppala AK, Yadav J, Chakrabarti S, Rao KS (2012b) Vowel onset point detection for low bit rate coded speech. IEEE Trans Audio Speech Lang Process 20(6):1894–1903
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, India
Kumud Tripathi & K. Sreenivasa Rao

Authors

Kumud Tripathi
View author publications
You can also search for this author in PubMed Google Scholar
K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kumud Tripathi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tripathi, K., Rao, K.S. VOP detection for read and conversation speech using CWT coefficients and phone boundaries. J Ambient Intell Human Comput 13, 105–116 (2022). https://doi.org/10.1007/s12652-020-02890-3

Download citation

Received: 24 May 2020
Accepted: 24 December 2020
Published: 07 January 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s12652-020-02890-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

VOP detection for read and conversation speech using CWT coefficients and phone boundaries

Abstract

Access this article

Similar content being viewed by others

Robust analysis for improvement of vowel onset point detection under noisy conditions

Improvements in the Detection of Vowel Onset and Offset Points in a Speech Sequence

Application of Zero-Frequency Filtering for Vowel Onset Point Detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

VOP detection for read and conversation speech using CWT coefficients and phone boundaries

Abstract

Access this article

Similar content being viewed by others

Robust analysis for improvement of vowel onset point detection under noisy conditions

Improvements in the Detection of Vowel Onset and Offset Points in a Speech Sequence

Application of Zero-Frequency Filtering for Vowel Onset Point Detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation