A stacked auto-encoder with scaled conjugate gradient algorithm for Malayalam ASR

Pillai, Leena G.; Mubarak, D. Muhammad Noorul

doi:10.1007/s41870-020-00573-y

A stacked auto-encoder with scaled conjugate gradient algorithm for Malayalam ASR

Original Research
Published: 31 March 2021

Volume 13, pages 1473–1479, (2021)
Cite this article

International Journal of Information Technology Aims and scope Submit manuscript

Leena G. Pillai¹ &
D. Muhammad Noorul Mubarak¹

83 Accesses
3 Citations
Explore all metrics

Abstract

Automatic speech recognition (ASR) is entitled to automate natural speech perception and the processing mechanism through analysis in the linguistic and acoustic features of the speech signal. ASR for children is highly challenging due to their developing physical aspects and rapidly changing articulation features. Therefore, ASR for children is still at its infant level. In this work, a stacked multilayer auto-encoder (AE) network is designed for ASR of the Malayalam vowel, articulated by children in the age group of five to ten. The proposed network structured with an unsupervised pre-training followed by supervised training. The pre-training coupled with two layers of sparse auto-encoders and scaled conjugate gradient (SCG) algorithm used for back-propagation. The auto-encoders are used to pre-train the network in an unsupervised (self- supervised) manner with 40,500 features that include Mel frequency cepstral coefficients (MFCC) and its derivatives, spectrogram formants and zero crossing rate (ZCR). In the softmax layer, the pre-trained network retrained in a supervised manner with bottleneck features. Fine-tuning has been applied in the trained network to enhance its performance. The unsupervised and supervised layers are stacked together to form a comprehensive network. The designed network has shown an average accuracy of 97% in training and 89.5% accuracy in the test data-set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Neural Network Based Recognition and Classification of Bengali Phonemes: A Case Study of Bengali Unconstrained Speech

Autoencoder-Based Speech Features for Manipuri Dialect Identification

Automatic Recognition of Kazakh Speech Using Deep Neural Networks

Abbreviations

ASR:: Automatic speech recognition
AE:: Auto-encoder
MFCC:: Mel frequency cepstral coefficients
SCG:: Scaled conjugate gradient
ZCR:: Zero crossing rate
HMM:: Hidden Markov model
ANNs:: Artificial neural networks
DBN:: Deep belief network
RBM:: Restricted Boltzmann machine
MOM:: Method of moments

References

Ionescu CM (2013) The human respiratory system. The human respiratory system. Springer, London, pp 13–22
Chapter Google Scholar
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet Google Scholar
Ranzato MA, Huang FJ, Boureau YL, Le Cun Y (2007) Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: IEEE conference on computer vision and pattern recognition, CVPR’07, 2007. IEEE, pp 1–8
Pillai LG, Sherly E (2017) A deep learning based evaluation of articulation disorder and learning assistive system for autistic children. Int J Nat Language Comput (IJNLC) 6(5)
Deng L, Hinton G, Kingsbury B (2013) New types of deep neural network learning for speech recognition and related applications: an overview. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 8599–8603. IEEE, 2013
Hager WW, Zhang H (2006) A survey of nonlinear conjugate gradient methods. Pac J Optim 2(1):35–58
Møller MF (1993) A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw 6(4):525–533
Article Google Scholar
Khadse CB, Chaudhari MA, Borghate VB (2016) Electromagnetic compatibility estimator using scaled conjugate gradient backpropagation based artificial neural network. IEEE Trans Ind Inform 13(3):1036–1045
Article Google Scholar
Russell M, D’Arcy S (2007) Challenges for computer recognition of children’s speech. In: Workshop on speech and language technology in education, 2007
Orozco J, Reyes García CA (2003) Detecting pathologies from infant cry applying scaled conjugate gradient neural networks. In: European symposium on artificial neural networks, Bruges (Belgium), pp 349–354, 2003
Nidhyananthan SS, Shantha Selvakumari R, Shenbagalakshmi V (2014) Contemporary speech/speaker recognition with speech from impaired vocal apparatus. In: 2014 international conference on communication and network technologies (ICCNT), pp 198–202. IEEE, 2014
Sabu K, Rao P (2018) Automatic assessment of children’s oral reading using speech recognition and prosody modeling. CSI Trans ICT 6(2):221–225
Article Google Scholar
Russell M, Brown C, Skilling A, Series R, Wallace J, Bonham B, Barker P (1996) Applications of automatic speech recognition to speech and language development in young children. In: Spoken language, 1996. ICSLP 96. Proceedings, fourth international conference on, vol 1, pp 176–179. IEEE, 1996
Vachhani B, Bhat C, Das B, Kopparapu SK (2017) Deep auto encoder based speech features for improved dysarthric speech recognition. Proc Interspeech 2017:1854–1858
Article Google Scholar
Anand AV, Shobana Devi P, Stephen J, Bhadran VK (2012) Malayalam speech recognition system and its application for visually impaired people. In: India conference (INDICON), 2012 annual IEEE, pp 619–624. IEEE, 2012
Ittichaichareon C, Suksri S, Yingthawornsuk T (2012) Speech recognition using MFCC. In: International conference on computer graphics, simulation and modeling (ICGSM'2012), July, pp 28–29, 2012
Kumar AP, Kumar N, Kumar CS, Yadav AK, Sharma A (2016) Speech recognition using arithmetic coding and MFCC for Telugu language. In: 2016 3rd international conference on computing for sustainable global development (INDIACom), pp 265–268. IEEE, 2016
Lad NR, Nirmal JH, Naikare KD (2019) Total variability factor analysis for dysphonia detection. Int J Inf Technol 11(1):67–74
Google Scholar
Kulkarni N (2018) Use of complexity based features in diagnosis of mild Alzheimer disease using EEG signals. Int J Inf Technol 10(1):59–64
Google Scholar
Shete DS, Patil SB, Patil S (2014) Zero crossing rate and energy of the speech signal of Devanagari script. IOSR JVSP 4(1):1–5
Article Google Scholar
Panda SP, Nayak AK (2016) Automatic speech segmentation in syllable centric speech recognition system. Int J Speech Technol 19(1):9–18
Article Google Scholar
Bansal S, Agrawal SS, Kumar A (2019) Acoustic analysis and perception of emotions in hindi speech using words and sentences. Int J Inf Technol 11(4):807–812
Google Scholar
Huber JE, Stathopoulos ET, Curione GM, Ash TA, Johnson K (1999) Formants of children, women, and men: the effects of vocal intensity variation. J Acoust Soc Am 106(3):1532–1542
Article Google Scholar
Sainath TN, Mohamed A-R, Kingsbury B, Ramabhadran B (2013) Deep convolutional neural networks for LVCSR. In 2013 IEEE international conference on acoustics, speech and signal processing, pp. 8614–8618. IEEE, 2013
Ahmad W, Shahnawazuddin S, Kathania HK, Pradhan G, Samaddar AB (2017) Improving children's speech recognition through explicit pitch scaling based on iterative spectrogram inversion. In: INTERSPEECH, pp 2391–2395, 2017
Gehring J, Miao Y, Metze F, Waibel A (2013) Extracting deep bottleneck features using stacked auto-encoders. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3377–3381. IEEE, 2013
Hsu W-N, Glass J (2018) Extracting domain invariant features by unsupervised learning for robust automatic speech recognition. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5614–5618. IEEE, 2018
Dendani B, Bahi H, Sari T (2020) Speech enhancement based on deep auto encoder for remote Arabic speech recognition. In: International conference on image and signal processing, pp 221–229. Springer, Cham, 2020

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Kerala, Thiruvananthapuram, India
Leena G. Pillai & D. Muhammad Noorul Mubarak

Authors

Leena G. Pillai
View author publications
You can also search for this author in PubMed Google Scholar
D. Muhammad Noorul Mubarak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Muhammad Noorul Mubarak.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pillai, L.G., Mubarak, D.M.N. A stacked auto-encoder with scaled conjugate gradient algorithm for Malayalam ASR. Int. j. inf. tecnol. 13, 1473–1479 (2021). https://doi.org/10.1007/s41870-020-00573-y

Download citation

Received: 23 September 2019
Accepted: 17 November 2020
Published: 31 March 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s41870-020-00573-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A stacked auto-encoder with scaled conjugate gradient algorithm for Malayalam ASR

Abstract

Access this article

Similar content being viewed by others

Deep Neural Network Based Recognition and Classification of Bengali Phonemes: A Case Study of Bengali Unconstrained Speech

Autoencoder-Based Speech Features for Manipuri Dialect Identification

Automatic Recognition of Kazakh Speech Using Deep Neural Networks

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A stacked auto-encoder with scaled conjugate gradient algorithm for Malayalam ASR

Abstract

Access this article

Similar content being viewed by others

Deep Neural Network Based Recognition and Classification of Bengali Phonemes: A Case Study of Bengali Unconstrained Speech

Autoencoder-Based Speech Features for Manipuri Dialect Identification

Automatic Recognition of Kazakh Speech Using Deep Neural Networks

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation