Improving Speech to Text Alignment Based on Repetition Detection for Dysarthric Speech

Diwakar, G.; Karjigi, Veena

doi:10.1007/s00034-020-01419-5

Improving Speech to Text Alignment Based on Repetition Detection for Dysarthric Speech

Published: 24 April 2020

Volume 39, pages 5543–5567, (2020)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

G. Diwakar¹ &
Veena Karjigi¹

388 Accesses
3 Citations
Explore all metrics

Abstract

Alignment of transcription to the speech finds applications in video subtitling, human–computer interaction by means of natural language communication, etc. In spite of many advancements, alignment of transcription to speech remains a challenging task and may become even more challenging for dysarthric speech. Dysarthria is a motor speech disorder resulting from damaged peripheral or central nervous system and causes slow speaking rate, pronunciation deviations, and prolonged pause interval between words and syllables. One of the problems in aligning dysarthric speech to text is the presence of repetition. Repetition can be at syllable/word/phrase level. In this work, we proposed an algorithm for syllable boundary detection followed by syllable repetition detection in dysarthric speech. When a syllable is found to be repeated, that syllable is repeated automatically in the transcription also. Modified transcription is given to the aligner along with the dysarthric speech. The proposed system when tested for word alignment with 15 utterances containing 146 words resulted in root mean square error (RMSE) of 0.138 when compared with the existing work in the literature, which gives an RMSE of 0.276.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

References

Alignment results [online] available from: https://github.com/ytyeung/IS2015alignment
B. Bigi, K. Klessa, L. Georgeton, C. Meunier, A syllable-based analysis of speech temporal organization: a comparison between speaking styles in dysarthric and healthy populations, in INTERSPEECH (2015), pp. 2977–2981
S. Chandrakala, N. Rajeswari, Representation learning based speech assistive system for persons with dysarthria. IEEE Trans. Neural Syst. Rehabil. Eng. 25(9), 1510–1517 (2017)
Article Google Scholar
CMU-Sphinx: Open Source Toolkit for Speech Recognition. http://cmusphinx.sourceforge.net
G. Diwakar, V. Karjigi, Repetition detection in dysarthric speech, in IEEE International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai (2017), pp. 1178–1181
J.R. Glass, V.W. Zue, Multi-level acoustic segmentation of continuous speech, in IEEE International Conference on Acoustics, Speech, and Signal Processing (1988), pp. 429–432
A. Haubold, J.R. Kender, Alignment of speech to highly imperfect text transcriptions (2007)
A.B. Kain, J.-P. Hosom, X. Niu, J.P.H. van Santen, M. Fried-Oken, J. Staehely, Improving the intelligibility of dysarthric speech. Speech Commun. 49(2), 743–759 (2007)
Article Google Scholar
M. Kaushik, M. Trinkle, A.H. Sakhtsari, Automatic detection and removal of disfluencies from spontaneous speech, in Australasian International Conference on Speech Science and Technology (SST) (2010), pp. 98–101
H. Kim, M. Hasegawa-Johnson, A. Perlman, J. Gunderson, T.S. Huang, K. Watkin, S. Frame, Dysarthric speech database for universal access research, in INTERSPEECH (2008), pp. 1741–1744
M. Kim, Y. Kim, J. Yoo, J. Wang, H. Kim, Regularized speaker adaptation of KL-HMM for dysarthric speech recognition. IEEE Trans. Neural Syst. Rehabil. Eng. 25(9), 1581–1591 (2017)
Article Google Scholar
J. Lopes, S. Giampiero, S. Gabriel, A. Alberto, G. Joakim, B. Fernando, M. Raveesh, T. Isabel, Detecting repetitions in spoken dialogue systems using phonetic distances, in INTERSPEECH (2015), pp. 1805–1809
X. Menendez-Pidal, J.B. Polikoff, S.M. Peters, J.E. Leonzio, H.T. Bunnell, The Nemours database of dysarthric speech, in International Conference on Spoken Language (1996), pp. 1962–1965
K.T. Mengistu, F. Rudzicz, Adapting acoustic and lexical models to dysarthric speech, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 4924–4927
O.C. Morales, S. Cox, Modelling confusion matrices to improve speech recognition accuracy, with an application to dysarthric speech, in INTERSPEECH (2007), pp. 1565–1568
T. Nagarajan, H.A. Murthy, R.M. Hegde, Segmentation of speech into syllable-like units, in European Conference on Speech Communication and Technology (2003), pp. 2893–2896
S. Oue, R. Marxer, F. Rudzicz, Automatic dysfluency detection in Dysarthric speech using deep belief networks, in Workshop on Speech and Language Processing for Assistive Technologies (SLPAT) (2015), pp. 60–64
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, The Kaldi speech recognition toolkit, in IEEE Workshop on Automatic Speech Recognition and Understanding (2011)
O. Räsänen, G. Doyle, M.C. Frank, Pre-linguistic segmentation of speech into syllable-like units. Cognition 171, 130–150 (2018)
Article Google Scholar
K.M. Ravikumar, B. Reddy, R. Rajagopal, H.C. Nagaraj, Automatic detection of syllable repetition in read speech for objective assessment of stuttered disfluencies, in World Academy Science, Engineering and Technology (2008), pp. 270–273
F. Rudzicz, A.K. Namasivayam, T. Wolff, The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang. Resour. Eval. 46, 523–541 (2012)
Article Google Scholar
M. Sharma, R. Mammone, Blind speech segmentation: automatic segmentation of speech without linguistic knowledge, in International Conference on Spoken Language, ICSLP (1996), pp. 1237–1240
Sphinx Knowledge Base Tool—VERSION 3 [online] available from: http://www.speech.cs.cmu.edu/tools/lmtool-new.html
R. Sriranjani, S. Umesh, M.R. Reddy, Pronunciation adaptation for disordered speech recognition using state-specific vectors of phone-cluster adaptive training, in Workshop on Speech and Language Processing for Assistive Technologies (2015)
T. Svendsen, F. Soong, On the automatic segmentation of speech signals, in IEEE International Conference on Acoustics, Speech, and Signal Processing (1987), pp. 77–80
J.P. Van Hemert, Automatic segmentation of speech. IEEE Trans. Signal Process. 39, 1008–1012 (1991)
Article Google Scholar
Y.T. Yeung, K.H. Wong, H. Meng, Improving automatic forced alignment for dysarthric speech transcription, in INTERSPEECH (2015), pp. 2991–2995
K.M. Yorkston, D.R. Beukelman, K.R. Bell, Clinical Management of Dysarthric Speakers (College-Hill Press, San Diego, CA, 1988)
Google Scholar
S. Young, J. Odell, D. Ollason, V. Valtchev, P. Woodland, The HTK Book (Cambridge University, Cambridge, 1995)
Google Scholar

Download references

Acknowledgements

The authors thank anonymous reviewers for their valuable comments. This work was supported by Science and Engineering Research Board, Department of Science and Technology (SERB-DST), Government of India (No. YSS/2014/000563)

Author information

Authors and Affiliations

Department of Electronics and Communication, Siddaganga Institute of Technology - Tumakuru, Tumakuru, Karnataka, India
G. Diwakar & Veena Karjigi

Authors

G. Diwakar
View author publications
You can also search for this author in PubMed Google Scholar
Veena Karjigi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. Diwakar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Diwakar, G., Karjigi, V. Improving Speech to Text Alignment Based on Repetition Detection for Dysarthric Speech. Circuits Syst Signal Process 39, 5543–5567 (2020). https://doi.org/10.1007/s00034-020-01419-5

Download citation

Received: 11 May 2019
Revised: 03 April 2020
Accepted: 06 April 2020
Published: 24 April 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s00034-020-01419-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving Speech to Text Alignment Based on Repetition Detection for Dysarthric Speech

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving Speech to Text Alignment Based on Repetition Detection for Dysarthric Speech

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation