Skip to main content
Log in

Improving Speech to Text Alignment Based on Repetition Detection for Dysarthric Speech

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Alignment of transcription to the speech finds applications in video subtitling, human–computer interaction by means of natural language communication, etc. In spite of many advancements, alignment of transcription to speech remains a challenging task and may become even more challenging for dysarthric speech. Dysarthria is a motor speech disorder resulting from damaged peripheral or central nervous system and causes slow speaking rate, pronunciation deviations, and prolonged pause interval between words and syllables. One of the problems in aligning dysarthric speech to text is the presence of repetition. Repetition can be at syllable/word/phrase level. In this work, we proposed an algorithm for syllable boundary detection followed by syllable repetition detection in dysarthric speech. When a syllable is found to be repeated, that syllable is repeated automatically in the transcription also. Modified transcription is given to the aligner along with the dysarthric speech. The proposed system when tested for word alignment with 15 utterances containing 146 words resulted in root mean square error (RMSE) of 0.138 when compared with the existing work in the literature, which gives an RMSE of 0.276.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Alignment results [online] available from: https://github.com/ytyeung/IS2015alignment

  2. B. Bigi, K. Klessa, L. Georgeton, C. Meunier, A syllable-based analysis of speech temporal organization: a comparison between speaking styles in dysarthric and healthy populations, in INTERSPEECH (2015), pp. 2977–2981

  3. S. Chandrakala, N. Rajeswari, Representation learning based speech assistive system for persons with dysarthria. IEEE Trans. Neural Syst. Rehabil. Eng. 25(9), 1510–1517 (2017)

    Article  Google Scholar 

  4. CMU-Sphinx: Open Source Toolkit for Speech Recognition. http://cmusphinx.sourceforge.net

  5. G. Diwakar, V. Karjigi, Repetition detection in dysarthric speech, in IEEE International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai (2017), pp. 1178–1181

  6. J.R. Glass, V.W. Zue, Multi-level acoustic segmentation of continuous speech, in IEEE International Conference on Acoustics, Speech, and Signal Processing (1988), pp. 429–432

  7. A. Haubold, J.R. Kender, Alignment of speech to highly imperfect text transcriptions (2007)

  8. A.B. Kain, J.-P. Hosom, X. Niu, J.P.H. van Santen, M. Fried-Oken, J. Staehely, Improving the intelligibility of dysarthric speech. Speech Commun. 49(2), 743–759 (2007)

    Article  Google Scholar 

  9. M. Kaushik, M. Trinkle, A.H. Sakhtsari, Automatic detection and removal of disfluencies from spontaneous speech, in Australasian International Conference on Speech Science and Technology (SST) (2010), pp. 98–101

  10. H. Kim, M. Hasegawa-Johnson, A. Perlman, J. Gunderson, T.S. Huang, K. Watkin, S. Frame, Dysarthric speech database for universal access research, in INTERSPEECH (2008), pp. 1741–1744

  11. M. Kim, Y. Kim, J. Yoo, J. Wang, H. Kim, Regularized speaker adaptation of KL-HMM for dysarthric speech recognition. IEEE Trans. Neural Syst. Rehabil. Eng. 25(9), 1581–1591 (2017)

    Article  Google Scholar 

  12. J. Lopes, S. Giampiero, S. Gabriel, A. Alberto, G. Joakim, B. Fernando, M. Raveesh, T. Isabel, Detecting repetitions in spoken dialogue systems using phonetic distances, in INTERSPEECH (2015), pp. 1805–1809

  13. X. Menendez-Pidal, J.B. Polikoff, S.M. Peters, J.E. Leonzio, H.T. Bunnell, The Nemours database of dysarthric speech, in International Conference on Spoken Language (1996), pp. 1962–1965

  14. K.T. Mengistu, F. Rudzicz, Adapting acoustic and lexical models to dysarthric speech, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 4924–4927

  15. O.C. Morales, S. Cox, Modelling confusion matrices to improve speech recognition accuracy, with an application to dysarthric speech, in INTERSPEECH (2007), pp. 1565–1568

  16. T. Nagarajan, H.A. Murthy, R.M. Hegde, Segmentation of speech into syllable-like units, in European Conference on Speech Communication and Technology (2003), pp. 2893–2896

  17. S. Oue, R. Marxer, F. Rudzicz, Automatic dysfluency detection in Dysarthric speech using deep belief networks, in Workshop on Speech and Language Processing for Assistive Technologies (SLPAT) (2015), pp. 60–64

  18. D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, The Kaldi speech recognition toolkit, in IEEE Workshop on Automatic Speech Recognition and Understanding (2011)

  19. O. Räsänen, G. Doyle, M.C. Frank, Pre-linguistic segmentation of speech into syllable-like units. Cognition 171, 130–150 (2018)

    Article  Google Scholar 

  20. K.M. Ravikumar, B. Reddy, R. Rajagopal, H.C. Nagaraj, Automatic detection of syllable repetition in read speech for objective assessment of stuttered disfluencies, in World Academy Science, Engineering and Technology (2008), pp. 270–273

  21. F. Rudzicz, A.K. Namasivayam, T. Wolff, The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang. Resour. Eval. 46, 523–541 (2012)

    Article  Google Scholar 

  22. M. Sharma, R. Mammone, Blind speech segmentation: automatic segmentation of speech without linguistic knowledge, in International Conference on Spoken Language, ICSLP (1996), pp. 1237–1240

  23. Sphinx Knowledge Base Tool—VERSION 3 [online] available from: http://www.speech.cs.cmu.edu/tools/lmtool-new.html

  24. R. Sriranjani, S. Umesh, M.R. Reddy, Pronunciation adaptation for disordered speech recognition using state-specific vectors of phone-cluster adaptive training, in Workshop on Speech and Language Processing for Assistive Technologies (2015)

  25. T. Svendsen, F. Soong, On the automatic segmentation of speech signals, in IEEE International Conference on Acoustics, Speech, and Signal Processing (1987), pp. 77–80

  26. J.P. Van Hemert, Automatic segmentation of speech. IEEE Trans. Signal Process. 39, 1008–1012 (1991)

    Article  Google Scholar 

  27. Y.T. Yeung, K.H. Wong, H. Meng, Improving automatic forced alignment for dysarthric speech transcription, in INTERSPEECH (2015), pp. 2991–2995

  28. K.M. Yorkston, D.R. Beukelman, K.R. Bell, Clinical Management of Dysarthric Speakers (College-Hill Press, San Diego, CA, 1988)

    Google Scholar 

  29. S. Young, J. Odell, D. Ollason, V. Valtchev, P. Woodland, The HTK Book (Cambridge University, Cambridge, 1995)

    Google Scholar 

Download references

Acknowledgements

The authors thank anonymous reviewers for their valuable comments. This work was supported by Science and Engineering Research Board, Department of Science and Technology (SERB-DST), Government of India (No. YSS/2014/000563)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. Diwakar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Diwakar, G., Karjigi, V. Improving Speech to Text Alignment Based on Repetition Detection for Dysarthric Speech. Circuits Syst Signal Process 39, 5543–5567 (2020). https://doi.org/10.1007/s00034-020-01419-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-020-01419-5

Keywords

Navigation