Abstract
Alignment of transcription to the speech finds applications in video subtitling, human–computer interaction by means of natural language communication, etc. In spite of many advancements, alignment of transcription to speech remains a challenging task and may become even more challenging for dysarthric speech. Dysarthria is a motor speech disorder resulting from damaged peripheral or central nervous system and causes slow speaking rate, pronunciation deviations, and prolonged pause interval between words and syllables. One of the problems in aligning dysarthric speech to text is the presence of repetition. Repetition can be at syllable/word/phrase level. In this work, we proposed an algorithm for syllable boundary detection followed by syllable repetition detection in dysarthric speech. When a syllable is found to be repeated, that syllable is repeated automatically in the transcription also. Modified transcription is given to the aligner along with the dysarthric speech. The proposed system when tested for word alignment with 15 utterances containing 146 words resulted in root mean square error (RMSE) of 0.138 when compared with the existing work in the literature, which gives an RMSE of 0.276.
Similar content being viewed by others
References
Alignment results [online] available from: https://github.com/ytyeung/IS2015alignment
B. Bigi, K. Klessa, L. Georgeton, C. Meunier, A syllable-based analysis of speech temporal organization: a comparison between speaking styles in dysarthric and healthy populations, in INTERSPEECH (2015), pp. 2977–2981
S. Chandrakala, N. Rajeswari, Representation learning based speech assistive system for persons with dysarthria. IEEE Trans. Neural Syst. Rehabil. Eng. 25(9), 1510–1517 (2017)
CMU-Sphinx: Open Source Toolkit for Speech Recognition. http://cmusphinx.sourceforge.net
G. Diwakar, V. Karjigi, Repetition detection in dysarthric speech, in IEEE International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai (2017), pp. 1178–1181
J.R. Glass, V.W. Zue, Multi-level acoustic segmentation of continuous speech, in IEEE International Conference on Acoustics, Speech, and Signal Processing (1988), pp. 429–432
A. Haubold, J.R. Kender, Alignment of speech to highly imperfect text transcriptions (2007)
A.B. Kain, J.-P. Hosom, X. Niu, J.P.H. van Santen, M. Fried-Oken, J. Staehely, Improving the intelligibility of dysarthric speech. Speech Commun. 49(2), 743–759 (2007)
M. Kaushik, M. Trinkle, A.H. Sakhtsari, Automatic detection and removal of disfluencies from spontaneous speech, in Australasian International Conference on Speech Science and Technology (SST) (2010), pp. 98–101
H. Kim, M. Hasegawa-Johnson, A. Perlman, J. Gunderson, T.S. Huang, K. Watkin, S. Frame, Dysarthric speech database for universal access research, in INTERSPEECH (2008), pp. 1741–1744
M. Kim, Y. Kim, J. Yoo, J. Wang, H. Kim, Regularized speaker adaptation of KL-HMM for dysarthric speech recognition. IEEE Trans. Neural Syst. Rehabil. Eng. 25(9), 1581–1591 (2017)
J. Lopes, S. Giampiero, S. Gabriel, A. Alberto, G. Joakim, B. Fernando, M. Raveesh, T. Isabel, Detecting repetitions in spoken dialogue systems using phonetic distances, in INTERSPEECH (2015), pp. 1805–1809
X. Menendez-Pidal, J.B. Polikoff, S.M. Peters, J.E. Leonzio, H.T. Bunnell, The Nemours database of dysarthric speech, in International Conference on Spoken Language (1996), pp. 1962–1965
K.T. Mengistu, F. Rudzicz, Adapting acoustic and lexical models to dysarthric speech, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 4924–4927
O.C. Morales, S. Cox, Modelling confusion matrices to improve speech recognition accuracy, with an application to dysarthric speech, in INTERSPEECH (2007), pp. 1565–1568
T. Nagarajan, H.A. Murthy, R.M. Hegde, Segmentation of speech into syllable-like units, in European Conference on Speech Communication and Technology (2003), pp. 2893–2896
S. Oue, R. Marxer, F. Rudzicz, Automatic dysfluency detection in Dysarthric speech using deep belief networks, in Workshop on Speech and Language Processing for Assistive Technologies (SLPAT) (2015), pp. 60–64
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, The Kaldi speech recognition toolkit, in IEEE Workshop on Automatic Speech Recognition and Understanding (2011)
O. Räsänen, G. Doyle, M.C. Frank, Pre-linguistic segmentation of speech into syllable-like units. Cognition 171, 130–150 (2018)
K.M. Ravikumar, B. Reddy, R. Rajagopal, H.C. Nagaraj, Automatic detection of syllable repetition in read speech for objective assessment of stuttered disfluencies, in World Academy Science, Engineering and Technology (2008), pp. 270–273
F. Rudzicz, A.K. Namasivayam, T. Wolff, The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang. Resour. Eval. 46, 523–541 (2012)
M. Sharma, R. Mammone, Blind speech segmentation: automatic segmentation of speech without linguistic knowledge, in International Conference on Spoken Language, ICSLP (1996), pp. 1237–1240
Sphinx Knowledge Base Tool—VERSION 3 [online] available from: http://www.speech.cs.cmu.edu/tools/lmtool-new.html
R. Sriranjani, S. Umesh, M.R. Reddy, Pronunciation adaptation for disordered speech recognition using state-specific vectors of phone-cluster adaptive training, in Workshop on Speech and Language Processing for Assistive Technologies (2015)
T. Svendsen, F. Soong, On the automatic segmentation of speech signals, in IEEE International Conference on Acoustics, Speech, and Signal Processing (1987), pp. 77–80
J.P. Van Hemert, Automatic segmentation of speech. IEEE Trans. Signal Process. 39, 1008–1012 (1991)
Y.T. Yeung, K.H. Wong, H. Meng, Improving automatic forced alignment for dysarthric speech transcription, in INTERSPEECH (2015), pp. 2991–2995
K.M. Yorkston, D.R. Beukelman, K.R. Bell, Clinical Management of Dysarthric Speakers (College-Hill Press, San Diego, CA, 1988)
S. Young, J. Odell, D. Ollason, V. Valtchev, P. Woodland, The HTK Book (Cambridge University, Cambridge, 1995)
Acknowledgements
The authors thank anonymous reviewers for their valuable comments. This work was supported by Science and Engineering Research Board, Department of Science and Technology (SERB-DST), Government of India (No. YSS/2014/000563)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Diwakar, G., Karjigi, V. Improving Speech to Text Alignment Based on Repetition Detection for Dysarthric Speech. Circuits Syst Signal Process 39, 5543–5567 (2020). https://doi.org/10.1007/s00034-020-01419-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-020-01419-5