A stacked auto-encoder with scaled conjugate gradient algorithm for Malayalam ASR,International Journal of Information Technology

当前位置： X-MOL 学术 › Int. J. Inf. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A stacked auto-encoder with scaled conjugate gradient algorithm for Malayalam ASR
International Journal of Information Technology Pub Date : 2021-03-31 , DOI: 10.1007/s41870-020-00573-y
Leena G. Pillai , D. Muhammad Noorul Mubarak

Automatic speech recognition (ASR) is entitled to automate natural speech perception and the processing mechanism through analysis in the linguistic and acoustic features of the speech signal. ASR for children is highly challenging due to their developing physical aspects and rapidly changing articulation features. Therefore, ASR for children is still at its infant level. In this work, a stacked multilayer auto-encoder (AE) network is designed for ASR of the Malayalam vowel, articulated by children in the age group of five to ten. The proposed network structured with an unsupervised pre-training followed by supervised training. The pre-training coupled with two layers of sparse auto-encoders and scaled conjugate gradient (SCG) algorithm used for back-propagation. The auto-encoders are used to pre-train the network in an unsupervised (self- supervised) manner with 40,500 features that include Mel frequency cepstral coefficients (MFCC) and its derivatives, spectrogram formants and zero crossing rate (ZCR). In the softmax layer, the pre-trained network retrained in a supervised manner with bottleneck features. Fine-tuning has been applied in the trained network to enhance its performance. The unsupervised and supervised layers are stacked together to form a comprehensive network. The designed network has shown an average accuracy of 97% in training and 89.5% accuracy in the test data-set.

中文翻译：

Malayalam ASR的带比例共轭梯度算法的堆叠式自动编码器

自动语音识别（ASR）有权通过分析语音信号的语言和声学特征来自动执行自然语音感知和处理机制。由于儿童不断发展的身体状况和快速变化的关节功能，儿童的ASR面临着巨大的挑战。因此，儿童的ASR仍处于婴儿水平。在这项工作中，为马拉雅拉姆语元音的ASR设计了一个堆叠的多层自动编码器（AE）网络，该网络由五到十岁的儿童发音。拟议的网络由无监督的预培训和监督的培训组成。预训练加上两层稀疏自动编码器和用于反向传播的缩放共轭梯度（SCG）算法。自动编码器用于以无监督（自我监督）的方式对网络进行预训练，具有40,500个特征，其中包括梅尔频率倒谱系数（MFCC）及其派生词，频谱图共振峰和零交叉率（ZCR）。在softmax层中，预训练网络以瓶颈方式通过监督方式进行再训练。微调已在受过训练的网络中应用，以增强其性能。无监督的层和受监督的层堆叠在一起以形成一个全面的网络。设计的网络在训练中显示了97％的平均准确度，在测试数据集中显示了89.5％的准确度。预先训练的网络以瓶颈方式以有监督的方式进行了再训练。微调已在受过训练的网络中应用，以增强其性能。无监督的层和受监督的层堆叠在一起以形成一个全面的网络。设计的网络在训练中显示了97％的平均准确度，在测试数据集中显示了89.5％的准确度。预先训练的网络以瓶颈方式以有监督的方式进行了再训练。微调已在受过训练的网络中应用，以增强其性能。无监督的层和受监督的层堆叠在一起以形成一个全面的网络。设计的网络在训练中显示了97％的平均准确度，在测试数据集中显示了89.5％的准确度。

更新日期：2021-04-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>