An emergent deep developmental model for auditory learning,Journal of Experimental & Theoretical Artificial Intelligence

当前位置： X-MOL 学术 › J. Exp. Theor. Artif. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An emergent deep developmental model for auditory learning
Journal of Experimental & Theoretical Artificial Intelligence ( IF 2.2 ) Pub Date : 2019-10-03 , DOI: 10.1080/0952813x.2019.1672795
Dongshu Wang ₁ , Yadong Zhang ₁ , Jianbin Xin ₁

Affiliation

ABSTRACT Speech recognition performance of the machine has been greatly improved using artificial intelligence. However, compared with the superior recognition ability of human auditory system, the machine still has some problems to deal with. Based on the existing physiological principle of human auditory system, this paper proposes a novel emergent auditory model. This model simulates each key part of the human auditory pathway with a deep developmental network (DDN). Furthermore, this model simulates the function of the superior colliculus in the thalamus, i.e., context integration, as an additional layer in the DDN. Mel-frequency cepstral coefficients (MFCC) are used to extract the speech signal features to be inputs of the DDN. This work is different from other previous models as we emphasise the mechanism that makes a system to develop its emergent representations from its operational experience, i.e., the internal unsupervised neurons of the DDN are utilised to depict the short contexts, and competitions among them afford an interpretation of how such internal neurons denote the different speech contexts when they are not supervised by the external world. Experimental results show the advantage of the proposed DNN compared to the state-of-the-art methods for the recognition accuracies of English words and phrases.

中文翻译：

听觉学习的新兴深度发展模型

摘要利用人工智能，机器的语音识别性能得到了极大的提高。然而，与人类听觉系统优越的识别能力相比，机器仍然存在一些问题需要处理。本文基于人类听觉系统现有的生理原理，提出了一种新的突发听觉模型。该模型使用深度发育网络 (DDN) 模拟人类听觉通路的每个关键部分。此外，该模型模拟丘脑中上丘的功能，即上下文整合，作为 DDN 中的附加层。梅尔频率倒谱系数 (MFCC) 用于提取语音信号特征作为 DDN 的输入。这项工作与之前的其他模型不同，因为我们强调了使系统从其操作经验中发展其涌现表示的机制，即利用 DDN 的内部无监督神经元来描述短上下文，并且它们之间的竞争提供了一个解释这些内部神经元在不受外部世界监督时如何表示不同的语音上下文。实验结果表明，与最先进的英语单词和短语识别精度方法相比，所提出的 DNN 具有优势。它们之间的竞争解释了当这些内部神经元不受外部世界监督时如何表示不同的语音上下文。实验结果表明，与最先进的英语单词和短语识别精度方法相比，所提出的 DNN 具有优势。它们之间的竞争解释了当这些内部神经元不受外部世界监督时如何表示不同的语音上下文。实验结果表明，与最先进的英语单词和短语识别精度方法相比，所提出的 DNN 具有优势。

更新日期：2019-10-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>