Skip to main content
Log in

A stacked auto-encoder with scaled conjugate gradient algorithm for Malayalam ASR

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

Automatic speech recognition (ASR) is entitled to automate natural speech perception and the processing mechanism through analysis in the linguistic and acoustic features of the speech signal. ASR for children is highly challenging due to their developing physical aspects and rapidly changing articulation features. Therefore, ASR for children is still at its infant level. In this work, a stacked multilayer auto-encoder (AE) network is designed for ASR of the Malayalam vowel, articulated by children in the age group of five to ten. The proposed network structured with an unsupervised pre-training followed by supervised training. The pre-training coupled with two layers of sparse auto-encoders and scaled conjugate gradient (SCG) algorithm used for back-propagation. The auto-encoders are used to pre-train the network in an unsupervised (self- supervised) manner with 40,500 features that include Mel frequency cepstral coefficients (MFCC) and its derivatives, spectrogram formants and zero crossing rate (ZCR). In the softmax layer, the pre-trained network retrained in a supervised manner with bottleneck features. Fine-tuning has been applied in the trained network to enhance its performance. The unsupervised and supervised layers are stacked together to form a comprehensive network. The designed network has shown an average accuracy of 97% in training and 89.5% accuracy in the test data-set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Abbreviations

ASR:

Automatic speech recognition

AE:

Auto-encoder

MFCC:

Mel frequency cepstral coefficients

SCG:

Scaled conjugate gradient

ZCR:

Zero crossing rate

HMM:

Hidden Markov model

ANNs:

Artificial neural networks

DBN:

Deep belief network

RBM:

Restricted Boltzmann machine

MOM:

Method of moments

References

  1. Ionescu CM (2013) The human respiratory system. The human respiratory system. Springer, London, pp 13–22

    Chapter  Google Scholar 

  2. Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  Google Scholar 

  3. Ranzato MA, Huang FJ, Boureau YL, Le Cun Y (2007) Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: IEEE conference on computer vision and pattern recognition, CVPR’07, 2007. IEEE, pp 1–8

  4. Pillai LG, Sherly E (2017) A deep learning based evaluation of articulation disorder and learning assistive system for autistic children. Int J Nat Language Comput (IJNLC) 6(5)

  5. Deng L, Hinton G, Kingsbury B (2013) New types of deep neural network learning for speech recognition and related applications: an overview. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 8599–8603. IEEE, 2013

  6. Hager WW, Zhang H (2006) A survey of nonlinear conjugate gradient methods. Pac J Optim 2(1):35–58

  7. Møller MF (1993) A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw 6(4):525–533

    Article  Google Scholar 

  8. Khadse CB, Chaudhari MA, Borghate VB (2016) Electromagnetic compatibility estimator using scaled conjugate gradient backpropagation based artificial neural network. IEEE Trans Ind Inform 13(3):1036–1045

    Article  Google Scholar 

  9. Russell M, D’Arcy S (2007) Challenges for computer recognition of children’s speech. In: Workshop on speech and language technology in education, 2007

  10. Orozco J, Reyes García CA (2003) Detecting pathologies from infant cry applying scaled conjugate gradient neural networks. In: European symposium on artificial neural networks, Bruges (Belgium), pp 349–354, 2003

  11. Nidhyananthan SS, Shantha Selvakumari R, Shenbagalakshmi V (2014) Contemporary speech/speaker recognition with speech from impaired vocal apparatus. In: 2014 international conference on communication and network technologies (ICCNT), pp 198–202. IEEE, 2014

  12. Sabu K, Rao P (2018) Automatic assessment of children’s oral reading using speech recognition and prosody modeling. CSI Trans ICT 6(2):221–225

    Article  Google Scholar 

  13. Russell M, Brown C, Skilling A, Series R, Wallace J, Bonham B, Barker P (1996) Applications of automatic speech recognition to speech and language development in young children. In: Spoken language, 1996. ICSLP 96. Proceedings, fourth international conference on, vol 1, pp 176–179. IEEE, 1996

  14. Vachhani B, Bhat C, Das B, Kopparapu SK (2017) Deep auto encoder based speech features for improved dysarthric speech recognition. Proc Interspeech 2017:1854–1858

    Article  Google Scholar 

  15. Anand AV, Shobana Devi P, Stephen J, Bhadran VK (2012) Malayalam speech recognition system and its application for visually impaired people. In: India conference (INDICON), 2012 annual IEEE, pp 619–624. IEEE, 2012

  16. Ittichaichareon C, Suksri S, Yingthawornsuk T (2012) Speech recognition using MFCC. In: International conference on computer graphics, simulation and modeling (ICGSM'2012), July, pp 28–29, 2012

  17. Kumar AP, Kumar N, Kumar CS, Yadav AK, Sharma A (2016) Speech recognition using arithmetic coding and MFCC for Telugu language. In: 2016 3rd international conference on computing for sustainable global development (INDIACom), pp 265–268. IEEE, 2016

  18. Lad NR, Nirmal JH, Naikare KD (2019) Total variability factor analysis for dysphonia detection. Int J Inf Technol 11(1):67–74

    Google Scholar 

  19. Kulkarni N (2018) Use of complexity based features in diagnosis of mild Alzheimer disease using EEG signals. Int J Inf Technol 10(1):59–64

    Google Scholar 

  20. Shete DS, Patil SB, Patil S (2014) Zero crossing rate and energy of the speech signal of Devanagari script. IOSR JVSP 4(1):1–5

    Article  Google Scholar 

  21. Panda SP, Nayak AK (2016) Automatic speech segmentation in syllable centric speech recognition system. Int J Speech Technol 19(1):9–18

    Article  Google Scholar 

  22. Bansal S, Agrawal SS, Kumar A (2019) Acoustic analysis and perception of emotions in hindi speech using words and sentences. Int J Inf Technol 11(4):807–812

    Google Scholar 

  23. Huber JE, Stathopoulos ET, Curione GM, Ash TA, Johnson K (1999) Formants of children, women, and men: the effects of vocal intensity variation. J Acoust Soc Am 106(3):1532–1542

    Article  Google Scholar 

  24. Sainath TN, Mohamed A-R, Kingsbury B, Ramabhadran B (2013) Deep convolutional neural networks for LVCSR. In 2013 IEEE international conference on acoustics, speech and signal processing, pp. 8614–8618. IEEE, 2013

  25. Ahmad W, Shahnawazuddin S, Kathania HK, Pradhan G, Samaddar AB (2017) Improving children's speech recognition through explicit pitch scaling based on iterative spectrogram inversion. In: INTERSPEECH, pp 2391–2395, 2017

  26. Gehring J, Miao Y, Metze F, Waibel A (2013) Extracting deep bottleneck features using stacked auto-encoders. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3377–3381. IEEE, 2013

  27. Hsu W-N, Glass J (2018) Extracting domain invariant features by unsupervised learning for robust automatic speech recognition. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5614–5618. IEEE, 2018

  28. Dendani B, Bahi H, Sari T (2020) Speech enhancement based on deep auto encoder for remote Arabic speech recognition. In: International conference on image and signal processing, pp 221–229. Springer, Cham, 2020

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Muhammad Noorul Mubarak.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pillai, L.G., Mubarak, D.M.N. A stacked auto-encoder with scaled conjugate gradient algorithm for Malayalam ASR. Int. j. inf. tecnol. 13, 1473–1479 (2021). https://doi.org/10.1007/s41870-020-00573-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41870-020-00573-y

Keywords

Navigation