Skip to main content
Log in

Monophone-based connected word Hindi speech recognition improvement

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

In this paper, a model is proposed to improve monophone-based connected word speech recognition for the Hindi language by utilizing the Hidden Markov Model (HMM). The model consists of hybrid subword units and domain-specific syntactic structures. The hybrid units contain both phoneme- and syllable-based subword units. As the syllable-based subword units cover a larger acoustic span, contextual effects are reduced. The syllable-based acoustic units are applied for modelling only nasal sound in the hybrid model for improving the recognition score of a nasal sound. Further, improvement is proposed using syntactic structures in the grammar definition during the recognition process. Using the domain-specific syntactic structures in the grammar, the search space for the recognizer is reduced; consequently, the performance of the system is improved. For example, two grammar definitions (gram1) with no restriction and grammar(gram2) with domain-specific structures were applied. The speech recognition framework was implemented using the HMM-based toolkit HTK with five-state HMMs. The self-created connected word speech dataset is used with a vocabulary of 240 Hindi words. The Mel frequency cepstral coefficients (MFCCs), MFCCs with energy (MFCC_E), and perceptual linear prediction coefficients with energy (PLP_E) are utilized for feature extraction. Further, monophones were trained with and without using silence fixing to check the impact of short pauses on the recognizer’s performance. The system was tested for both speaker-dependent and speaker-independent modes. It was found that using a hybrid model and grammar(gram2) with silence fixing provided the best results. The system obtained an overall word accuracy of 80.28%, word correct of 80.28%, and a word error rate of 19.72% using MFCCs, gram2, phoneme-based modelling, and silence fixing. For the PLP_E coefficients, hybrid model, silence fixing, and gram2, the system obtained an overall word accuracy of 88.54%, word correct of 88.54%, and the word error rate of 11.46%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9

Similar content being viewed by others

References

  1. Bansal P, Dev A and Jain S B 2008 Optimum HMM combined with vector quantization for Hindi speech recognition. IETE Journal of Research 54: 239–243

    Article  Google Scholar 

  2. Li Qin, Yuze Yang, Tianxiang Lan, Huifeng Zhu, Qi Wei, Fei Qiao, Xinjun Liu and Huazhong Yang 2020 MSP-MFCC: energy-efficient MFCC feature extraction method with mixed-signal processing architecture for wearable speech recognition applications. IEEE Access 8: 48720–48730

    Article  Google Scholar 

  3. Rabiner L R 1997 Applications of speech recognition in the area of telecommunications. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 501–510

  4. Saon G and Chien J T 2012 Large-vocabulary continuous speech recognition systems: a look at some recent advances. IEEE Signal Processing Magazine 29: 18–33

    Article  Google Scholar 

  5. Patil A, More P and Sasikumar M 2019 Incorporating finer acoustic-phonetic features in the lexicon for Hindi language speech recognition. Journal of Information and Optimization Sciences 40(8): 1731–1739

    Article  Google Scholar 

  6. Ying W, Zhang L and Deng H 2020 Sichuan dialect speech recognition with deep LSTM network. Frontiers of Computer Science 14: 378–387

    Article  Google Scholar 

  7. Cutajar M, Gatt E, Grech I, Casha O and Micallef J 2013 Comparative study of automatic speech recognition techniques. IET Signal Processing 7(1): 25–46

    Article  Google Scholar 

  8. Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D and Valtchev V 2002 The HTK book. Cambridge University Engineering Department, vol. 3(175), p. 12

    Google Scholar 

  9. Rabiner L R and Juang B H 1993 Fundamentals of speech recognition. Prentice-Hall International

  10. Dev A, Agrawal S S and Choudhury D R 2003 Categorization of Hindi phonemes by neural networks. AI and Society 17: 375–382

    Article  Google Scholar 

  11. Alsharhan E and Ramsay A 2019 Improved Arabic speech recognition system through the automatic generation of fine-grained phonetic transcriptions. Information Processing and Management 56: 343–353

    Article  Google Scholar 

  12. Passricha V and Aggarwal R K 2020 A comparative analysis of pooling strategies for convolutional neural network based Hindi ASR. Journal of Ambient Intelligence and Humanized Computing 11: 675–691

    Article  Google Scholar 

  13. Rapholo M, Manamela M J D and Gasela N Improving the performance of an automatic speech recognizer with domain-specific syntax structures. http://satnac.org.za/proceedings/2011/papers/Network_Services/136.pdf (accessed 19 Jan 2017)

  14. Dannenberg A, Werner S and Vainio M 2016 Prosodic and syntactic structures in spontaneous english speech. In: Proceedings of the International Conference on Speech Prosody, pp. 59–63

  15. Wang Y, Mohamed A, Le D, Liu C, Xiao A, Mahadeokar J, Huang H, Tjandra A, Zhang X, Zhang F and Fuegen C 2020 Transformer-based acoustic modeling for hybrid speech recognition. In: Proceedings of the ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 6874–6878

  16. Graves A, Jaitly N and Mohamed A R 2013 Hybrid speech recognition with deep bidirectional LSTM. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 273–278

  17. Sinha S, Agrawal S S and Jain A 2013 Continuous density hidden Markov model for context dependent Hindi speech recognition. In: Proceedings of the International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 1953–1958

  18. Bhatt S, Dev A and Jain A 2018 Hindi speech vowel recognition using hidden Markov model. In: Proceedings of the 6th International Workshop on Spoken Language Technologies for Under-Resourced Languages, pp. 196–199

  19. Dev A 2009 Effect of retroflex sounds on the recognition of Hindi voiced and unvoiced stops. AI and Society 23: 603–612

    Article  Google Scholar 

  20. Samudravijaya K 2003 Durational characteristics of Hindi stop consonants. In: Proceedings of EUROSPEECH 2003 – 8th European Conference on Speech and Communication Technology, pp. 81–84

  21. Bansal S and Dev A 2015 Emotional Hindi speech: feature extraction and classification. In: Proceedings of the 2nd International Conference on Computing for Sustainable Global Development (INDIACom), IEEE, pp. 1865–1868

  22. Anusuya M A and Katti S K 2010 Speech recognition by machine a review. arXiv preprint arXiv:1001.2267

  23. Kaur A and Singh A 2016 Optimizing feature extraction techniques constituting phone based modelling on connected words for Punjabi automatic speech recognition. In: Proceedings of the International Conference on Advances in Computing, Communications and Informatics, ICACCI 2016. Institute of Electrical and Electronics Engineers Inc, pp. 2104–2108

  24. Myers C and Levinson S 1982 Speaker independent connected word recognition using a syntax-directed dynamic programming procedure. IEEE Transactions on Acoustics, Speech, and Signal Processing 30(4): 561–565

    Article  Google Scholar 

  25. Patil P P and Pardeshi S A 2014 Marathi connected word speech recognition system. In: Proceedings of the First International Conference on Networks & Soft Computing, pp. 314–318

  26. Haeb-Umbach R, Geller D and Ney H 1993 Improvements in connected digit recognition using linear discriminant analysis and mixture densities. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 239–242

    Article  Google Scholar 

  27. Kumar K, Aggarwal R K and Jain A 2012 A Hindi speech recognition system for connected words using HTK. International Journal of Computational Systems Engineering 1(1): 25–32

    Article  Google Scholar 

  28. Singhal S and Dubey R K 2015 Automatic speech recognition for connected words using DTW/HMM for English/Hindi languages. In: Proceedings of Communication, Control and Intelligent Systems, pp. 199–203

  29. Chaudhary A, Chauhan M R and Gupta M G 2013 Automatic speech recognition system for isolated and connected words of Hindi language by using hidden Markov model toolkit (HTK). In: Proceedings of the International Conference on Emerging Trends in Engineering and Technology, Association of Computer Electronics and Electrical Engineers, pp. 847–853

  30. Dağitan U and Yalabik N 1990 Connected word recognition using neural networks. In: Neurocomputing. Berlin–Heidelberg: Springer, pp. 297–300

  31. Reddy D R 1967 Computer recognition of connected speech. Journal of the Acoustic Society of America 42: 329–347

    Article  Google Scholar 

  32. Makhoul J and Schwartz R 1995 State of the art in continuous speech recognition. Proceedings of the National Academy of Sciences 92(22): 9956–9963

    Article  Google Scholar 

  33. Madan A and Gupta D 2014 Speech feature extraction and classification: a comparative review. International Journal of Computer Applications 90(9): 20–25

    Article  Google Scholar 

  34. Jurafsky D and Martin J H 2007 Speech recognition: advanced topics. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, pp. 1–34

    Google Scholar 

  35. Anusuya M A and Katti S K 2011 Front end analysis of speech recognition: a review. International Journal of Speech Technology 14(2): 99–145

    Article  Google Scholar 

  36. Fook C Y, Muthusamy H, Chee L S, Yaacob S B and Adom A H B 2013 Comparison of speech parameterization techniques for the classification of speech disfluencies. Turkish Journal of Electrical Engineering & Computer Sciences 21(1): 1983–1994

    Article  Google Scholar 

  37. Krishnan Murali, Neophytou C P and Glenn Prescott 1994 Wavelet transform speech recognition using vector quantization, dynamic time warping and artificial neural networks. Center for Excellence in Computer Aided Systems Engineering and Telecommunications & Information Science Laboratory

  38. Burget L 2004 Combination of speech features using smoothed heteroscedastic linear discriminant analysis. In: Proceedings of the International Conference on Spoken Language Processing, pp. 2549–2552

  39. Botros N 1991 Neural nets for speech recognition advantages and limitations. In: Proceedings of Electro International, pp. 476–481

  40. Hermansky H 1990 Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustic Society of America 87: 1738–1752

    Article  Google Scholar 

  41. University of Cambridge 1989 HTK Speech Recognition Toolkit. http://htk.eng.cam.ac.uk/ (accessed 26 Jan 2016)

  42. Sadhukhan T, Bansal S and Kumar A 2017 Automatic identification of spoken language. IOSR Journal of Computer Engineering 19(2): 84–89

    Article  Google Scholar 

  43. Malviya S, Mishra R and Tiwary U S 2017 Structural analysis of Hindi phonetics and a method for extraction of phonetically rich sentences from a very large Hindi text corpus. In: Proceedings of the 2010 Conference of the Oriental Chapter of Int ernational Committee for Coordination and Standardization of Speech Databases and Assessment Technique (O-COCOSDA), pp. 188–193

  44. Bhuvanagirir K and Kopparapu S K 2012 Mixed language speech recognition without explicit identification of language. American Journal of Signal Processing 2(5): 92–97

    Article  Google Scholar 

  45. Kuamr A, Dua M and Choudhary T 2014 Continuous Hindi speech recognition using Gaussian mixture HMM. In: Proceedings of the IEEE Students’ Conference on Electrical, Electronics and Computer Science, pp. 1–5

  46. Kiran N and Ward N G 2008 Testing the value of a time-based language model for speech recognition. Tech. Rep. UTEP-CS-08-29, Department of Computer Science, University of Texas at El Paso,

  47. Tutorial: Create acoustic model manually. http://www.voxforge.org/home/dev/acousticmodels/linux/create/htkjulius/tutorial (accessed 20 Jan 2017)

  48. Paul B and Praat D W 2017 Doing phonetics by computer. http://www.fon.hum.uva.nl/praat/ (accessed 20 Jan 2017)

  49. Seng S, Sam S, Le V B, Bigi B and Besacier L 2008 Which units for acoustic and language modeling for Khmer automatic speech recognition. In: Proceedings of Spoken Languages Technologies for Under-Resourced Languages, pp. 33–38

  50. Lee C H, Juang B H, Soong F K and Rabiner L R 1989 Word recognition using whole word and subword models. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, pp. 683–686

Download references

Acknowledgement

The authors would like to acknowledge the Ministry of Electronics and Information Technology (MeitY), Government of India, for providing financial assistance for this research work through “Visvesvaraya PhD Scheme for Electronics and IT”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to SHOBHA BHATT.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

BHATT, S., JAIN, A. & DEV, A. Monophone-based connected word Hindi speech recognition improvement. Sādhanā 46, 99 (2021). https://doi.org/10.1007/s12046-021-01614-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-021-01614-3

Keywords

Navigation