当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Speech Command Recognition in Computationally Constrained Environments with a Quadratic Self-organized Operational Layer
arXiv - CS - Computation and Language Pub Date : 2020-11-23 , DOI: arxiv-2011.11436
Mohammad SoltanianDepartment of Computing Sciences, Tampere University, Finland, Junaid MalikDepartment of Computing Sciences, Tampere University, Finland, Jenni RaitoharjuProgramme for Environmental Information, Finnish Environment Institute, Jyvaskyla, Finland, Alexandros IosifidisDepartment of Electrical and Computer Engineering, Aarhus University, Denmark, Serkan KiranyazElectrical Engineering Department, Qatar University, Qatar, Moncef GabboujDepartment of Computing Sciences, Tampere University, Finland

Automatic classification of speech commands has revolutionized human computer interactions in robotic applications. However, employed recognition models usually follow the methodology of deep learning with complicated networks which are memory and energy hungry. So, there is a need to either squeeze these complicated models or use more efficient light-weight models in order to be able to implement the resulting classifiers on embedded devices. In this paper, we pick the second approach and propose a network layer to enhance the speech command recognition capability of a lightweight network and demonstrate the result via experiments. The employed method borrows the ideas of Taylor expansion and quadratic forms to construct a better representation of features in both input and hidden layers. This richer representation results in recognition accuracy improvement as shown by extensive experiments on Google speech commands (GSC) and synthetic speech commands (SSC) datasets.

中文翻译:

具有二次自组织运算层的计算受限环境中的语音命令识别

语音命令的自动分类彻底改变了机器人应用中的人机交互。但是,采用的识别模型通常遵循具有复杂网络的深度学习方法,这些网络会占用大量内存和精力。因此,有必要压缩这些复杂的模型或使用更有效的轻量级模型,以便能够在嵌入式设备上实现最终的分类器。在本文中,我们选择了第二种方法,并提出了一个网络层来增强轻量级网络的语音命令识别能力,并通过实验证明了这一结果。所采用的方法借鉴了泰勒展开和二次形式的思想,以在输入层和隐藏层中构造特征的更好表示。
更新日期:2020-11-25
down
wechat
bug