当前位置: X-MOL 学术IET Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Audio classification using braided convolutional neural networks
IET Signal Processing ( IF 1.7 ) Pub Date : 2020-08-31 , DOI: 10.1049/iet-spr.2019.0381
Harsh Sinha 1 , Vinayak Awasthi 2 , Pawan K. Ajmera 2
Affiliation  

Convolutional neural networks (CNNs) work surprisingly well and have helped drastically enhance the state-of-the-art techniques in the domain of image classification. The unprecedented success motivated the application of CNNs to the domain of auditory data. Recent publications suggest hidden Markov models and deep neural networks for audio classification. This study aims to achieve audio classification by representing audio as spectrogram images and then use a CNN-based architecture for classification. This study presents an innovative strategy for a CNN-based neural architecture that learns a sparse representation imitating the receptive neurons in the primary auditory cortex in mammals. The feasibility of the proposed CNN-based neural architecture is assessed for audio classification tasks on standard benchmark datasets such as Google Speech Commands datasets (GSCv1 and GSCv2) and the UrbanSound8K dataset (US8K). The proposed CNN architecture, referred to as braided convolutional neural network, achieves 97.15, 95 and 91.9% average recognition accuracy on GSCv1, GSCv2 and US8 K datasets, respectively, outperforming other deep learning architectures.

中文翻译:

使用编织卷积神经网络进行音频​​分类

卷积神经网络(CNN)出奇地运作良好,并已大大帮助增强了图像分类领域的最新技术。前所未有的成功激发了CNN在听觉数据领域的应用。最近的出版物提出了用于音频分类的隐马尔可夫模型和深度神经网络。这项研究旨在通过将音频表示为声谱图图像,然后使用基于CNN的体系结构进行分类来实现音频分类。这项研究提出了一种基于CNN的神经结构的创新策略,该结构学习了模仿哺乳动物初级听觉皮层中接受神经元的稀疏表示。针对标准基准数据集(例如Google语音命令数据集(GSCv1和GSCv2)和UrbanSound8K数据集(US8K))上的音频分类任务,评估了所提出的基于CNN的神经体系结构的可行性。所提出的CNN架构称为编织卷积神经网络,在GSCv1,GSCv2和US8 K数据集上分别达到97.15%,95%和91.9%的平均识别准确率,胜过其他深度学习架构。
更新日期:2020-09-01
down
wechat
bug