当前位置: X-MOL 学术Opt. Mem. Neural Networks › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Performance Optimization of Speech Recognition System with Deep Neural Network Model
Optical Memory and Neural Networks Pub Date : 2019-02-01 , DOI: 10.3103/s1060992x18040094
Wei Guan

Abstract

With the development of internet, man-machine interaction has tended to be more important. Precise speech recognition has become an important means to achieve man-machine interaction. In this study, deep neural network model was used to enhance speech recognition performance. Feedforward fully connected deep neural network, time-delay neural network, convolutional neural network and feedforward sequence memory neural network were studied, and their speech recognition performance was studied by comparing their acoustic models. Moreover, the recognition performance of the model after adding different dimension human voice features was tested. The results showed that the performance of the speech recognition system could be improved effectively by using the deep neural network model, and the performance of feedforward sequence memory neural network was the best, followed by deep neural network, time-delay neural network and convolutional neural network. Different extraction features had different improvement effects on model performance. The performance of the model which was added with Fbank extraction features was superior to that added with Mel-frequency cepstrum coefficient (MFCC) extraction feature. The model performance improved after the addition of vocal characteristics. Different models had different vocal characteristic dimensions.


中文翻译:

深度神经网络模型的语音识别系统性能优化

摘要

随着互联网的发展,人机交互已变得越来越重要。精确的语音识别已成为实现人机交互的重要手段。在这项研究中,深度神经网络模型被用来增强语音识别性能。研究了前馈全连接深度神经网络,时延神经网络,卷积神经网络和前馈序列记忆神经网络,并通过比较其声学模型研究了它们的语音识别性能。此外,还测试了添加不同维度的人声特征后模型的识别性能。结果表明,使用深度神经网络模型可以有效地提高语音识别系统的性能,前馈序列记忆神经网络性能最好,其次是深度神经网络,时延神经网络和卷积神经网络。不同的提取特征对模型性能具有不同的改进效果。添加了Fbank提取功能的模型的性能优于添加了梅尔频率倒谱系数(MFCC)提取功能的模型。添加声音特征后,模型性能得到改善。不同的模型具有不同的声音特征维度。添加了Fbank提取功能的模型的性能优于添加了梅尔频率倒谱系数(MFCC)提取功能的模型。添加声音特征后,模型性能得到改善。不同的模型具有不同的声音特征维度。添加了Fbank提取功能的模型的性能优于添加了梅尔频率倒谱系数(MFCC)提取功能的模型。添加声音特征后,模型性能得到改善。不同的模型具有不同的声音特征维度。
更新日期:2019-02-01
down
wechat
bug