Convolutional Neural Network-Gated Recurrent Unit Neural Network with Feature Fusion for Environmental Sound Classification,Automatic Control and Computer Sciences

当前位置： X-MOL 学术 › Aut. Control Comp. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Convolutional Neural Network-Gated Recurrent Unit Neural Network with Feature Fusion for Environmental Sound Classification
Automatic Control and Computer Sciences ( IF 0.6 ) Pub Date : 2021-09-02 , DOI: 10.3103/s0146411621040106
Yu Zhang ₁ , Jinfang Zeng ₁ , Youming Li ₁ , Da Chen ₁

Affiliation

Abstract

With the popular application of deep learning-based models in various classification problems, more and more researchers have applied these models to environmental sound classification (ESC) tasks in recent years. However, the performance of existing models that use acoustic features such as log-scaled mel spectrogram (Log mel) and mel frequency cepstral coefficient or raw waveform to train deep neural networks for ESC is unsatisfactory. In this paper, first of all, a fusion of multiple features consisting of Log mel, log-scaled cochleagram and log-scaled constant-Q transform are proposed, and these features are fused to form the feature set that is called LMCC. Then, a network called CNN-GRUNN which consists of convolutional neural network and gated recurrent unit neural network in parallel is presented to improve the performance of ESC with the proposed aggregated features. Experiments were conducted on ESC-10, ESC-50, and UrbanSound8K datasets. The experimental results indicate that the model with LMCC as input to CNN-GRUNN is appropriate for ESC problems. And our model is able to achieve good classification accuracy for the three datasets, i.e., ESC-10 (92.30%), ESC-50 (87.43%), and UrbanSound8K (96.10%).

中文翻译：

用于环境声音分类的具有特征融合的卷积神经网络-门控循环单元神经网络

摘要

随着基于深度学习的模型在各种分类问题中的流行应用，近年来越来越多的研究人员将这些模型应用于环境声音分类（ESC）任务。然而，使用声学特征（例如对数缩放梅尔谱图（Log mel）和梅尔频率倒谱系数或原始波形）来训练用于 ESC 的深度神经网络的现有模型的性能并不令人满意。在本文中，首先提出了由Log mel、log-scaled cochleagram和log-scaled constant-Q变换组成的多个特征的融合，并将这些特征融合形成称为LMCC的特征集。然后，提出了一种称为 CNN-GRUNN 的网络，该网络由并行的卷积神经网络和门控循环单元神经网络组成，以通过提出的聚合特征提高 ESC 的性能。实验在 ESC-10、ESC-50 和 UrbanSound8K 数据集上进行。实验结果表明，以 LMCC 作为输入到 CNN-GRUNN 的模型适用于 ESC 问题。并且我们的模型能够对三个数据集实现良好的分类精度，即 ESC-10 (92.30%)、ESC-50 (87.43%) 和 UrbanSound8K (96.10%)。

更新日期：2021-09-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文