当前位置: X-MOL 学术Computing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
English speech recognition based on deep learning with multiple features
Computing ( IF 3.7 ) Pub Date : 2019-08-26 , DOI: 10.1007/s00607-019-00753-0
Zhaojuan Song

English is one of the widely used languages, with the shrinking of the global village, the smart home, the in-vehicle voice system and voice recognition software with English as the recognition language have gradually entered people’s field of vision, and have obtained the majority of users’ love by the practical accuracy. And deep learning technology in many tasks with its hierarchical feature learning ability and data modeling capabilities has achieved more than the performance of shallow learning technology. Therefore, this paper takes English speech as the research object, and proposes a deep learning speech recognition algorithm that combines speech features and speech attributes. Firstly, the deep neural network supervised learning method is used to extract the high-level features of the speech, select the output of the fixed hidden layer as the new speech feature for the newly generated network, and train the GMM–HMM acoustic model with the new speech features; secondly, the speech attribute extractor based on deep neural network is trained for multiple speech attributes, and the extracted speech attributes are classified into phoneme by deep neural network; finally, speech features and speech attribute features are merged into the same CNN framework by the neural network based on the linear feature fusion algorithm. The experimental results show that the proposed English speech recognition algorithm based on deep neural network with multiple features can directly and effectively combine the two methods by combining the speech features and the speech attributes of the speaker in the input layer of the deep neural network, and it can improve the performance of the English speech recognition system significantly.

中文翻译:

基于多特征深度学习的英语语音识别

英语是被广泛使用的语言之一,随着地球村的缩小,智能家居、以英语为识别语言的车载语音系统和语音识别软件逐渐进入人们的视野,并获得了大多数以实用的准确性深受用户喜爱。而深度学习技术凭借其分层特征学习能力和数据建模能力在很多任务上都取得了超过浅层学习技术的性能。因此,本文以英语语音为研究对象,提出了一种结合语音特征和语音属性的深度学习语音识别算法。首先采用深度神经网络监督学习方法提取语音的高层特征,选择固定隐藏层的输出作为新生成的网络的新语音特征,用新的语音特征训练GMM-HMM声学模型;其次,基于深度神经网络的语音属性提取器针对多个语音属性进行训练,提取的语音属性通过深度神经网络进行音素分类;最后,通过基于线性特征融合算法的神经网络将语音特征和语音属性特征融合到同一个CNN框架中。实验结果表明,本文提出的基于多特征深度神经网络的英语语音识别算法,通过在深度神经网络的输入层结合说话人的语音特征和说话人的语音属性,可以直接有效地将两种方法结合起来,
更新日期:2019-08-26
down
wechat
bug