当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Performance Evaluation of Deep Convolutional Maxout Neural Network in Speech Recognition
arXiv - CS - Sound Pub Date : 2021-05-04 , DOI: arxiv-2105.01399
Arash Dehghani, Seyyed Ali Seyyedsalehi

In this paper, various structures and methods of Deep Artificial Neural Networks (DNN) will be evaluated and compared for the purpose of continuous Persian speech recognition. One of the first models of neural networks used in speech recognition applications were fully connected Neural Networks (FCNNs) and, consequently, Deep Neural Networks (DNNs). Although these models have better performance compared to GMM / HMM models, they do not have the proper structure to model local speech information. Convolutional Neural Network (CNN) is a good option for modeling the local structure of biological signals, including speech signals. Another issue that Deep Artificial Neural Networks face, is the convergence of networks on training data. The main inhibitor of convergence is the presence of local minima in the process of training. Deep Neural Network Pre-training methods, despite a large amount of computing, are powerful tools for crossing the local minima. But the use of appropriate neuronal models in the network structure seems to be a better solution to this problem. The Rectified Linear Unit neuronal model and the Maxout model are the most suitable neuronal models presented to this date. Several experiments were carried out to evaluate the performance of the methods and structures mentioned. After verifying the proper functioning of these methods, a combination of all models was implemented on FARSDAT speech database for continuous speech recognition. The results obtained from the experiments show that the combined model (CMDNN) improves the performance of ANNs in speech recognition versus the pre-trained fully connected NNs with sigmoid neurons by about 3%.

中文翻译:

深度卷积Maxout神经网络在语音识别中的性能评估

在本文中,将对深度人工神经网络(DNN)的各种结构和方法进行评估和比较,以实现连续的波斯语音识别。语音识别应用中使用的最早的神经网络模型之一是完全连接的神经网络(FCNN),因此是深度神经网络(DNN)。尽管与GMM / HMM模型相比,这些模型具有更好的性能,但是它们没有适当的结构来对本地语音信息进行建模。卷积神经网络(CNN)是对包括语音信号在内的生物信号的局部结构建模的好选择。深度人工神经网络面临的另一个问题是训练数据网络的融合。收敛的主要障碍是训练过程中局部极小值的存在。尽管进行了大量的计算,但深度神经网络预训练方法还是克服局部极小值的强大工具。但是在网络结构中使用适当的神经元模型似乎是解决此问题的更好方法。整流线性单位神经元模型和Maxout模型是迄今为止最合适的神经元模型。进行了几次实验,以评估上述方法和结构的性能。在验证了这些方法的正常功能后,在FARSDAT语音数据库上实现了所有模型的组合,以实现连续语音识别。从实验中获得的结果表明,与预训练的具有乙状神经元的全连接神经网络相比,组合模型(CMDNN)提高了语音识别中神经网络的性能。
更新日期:2021-05-05
down
wechat
bug