Environmental Sound Classification Based on Stacked Concatenated DNN using Aggregated Features,Journal of Signal Processing Systems

当前位置： X-MOL 学术 › J. Sign. Process. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Environmental Sound Classification Based on Stacked Concatenated DNN using Aggregated Features
Journal of Signal Processing Systems ( IF 1.8 ) Pub Date : 2021-09-16 , DOI: 10.1007/s11265-021-01702-x
Chengwei Liu _{1,

2} , Feng Hong ₁ , Haihong Feng _{1,

2} , Yushuang Zhai _{1,

2} , Youyuan Chen ₁

Affiliation

In recent years, there has been an increasing interest in Environmental Sound Classification (ESC), and it is a challenging non-speech audio event classification problem because of the complexity of the environment. However, the classification accuracy of the conventional methods is significantly dependent on the robustness of representative features and the effectiveness of the constructed model, which causes the poor adaptability of current models. Considering this, a novel ESC scheme based on stacked Deep Neural Networks with multi-dimensional aggregated features is proposed. Firstly, we use the aggregated features composed of time-domain features and time–frequency (TF) domain features to capture a more comprehensive representation of sounds. Afterward, the feature reduction based on Principal Component Analysis (PCA) is employed to select the most discriminative representations. Finally, a novel Stacked Deep Neural Networks based on ensemble learning and data augmentation is presented to improve the ESC scheme's generalizing capability. The experimental results demonstrate that the proposed method is appropriate for ESC problems, which achieves 96.1% and 98.1% accuracy scores for ESC-10 and UrbanSound8K datasets, respectively, and outperforms most state-of-art methods in ESC tasks at the aspect of both accuracy and computational burden.

中文翻译：

基于使用聚合特征的堆叠级联 DNN 的环境声音分类

近年来，环境声音分类（ESC）越来越受到关注，由于环境的复杂性，它是一个具有挑战性的非语音音频事件分类问题。然而，传统方法的分类精度显着依赖于代表性特征的鲁棒性和构建模型的有效性，导致现有模型的适应性较差。考虑到这一点，提出了一种基于具有多维聚合特征的堆叠深度神经网络的新型ESC方案。首先，我们使用由时域特征和时频（TF）域特征组成的聚合特征来捕获更全面的声音表示。之后，采用基于主成分分析 (PCA) 的特征减少来选择最具辨别力的表示。最后，提出了一种基于集成学习和数据增强的新型堆叠深度神经网络，以提高 ESC 方案的泛化能力。实验结果表明，所提出的方法适用于 ESC 问题，在 ESC-10 和 UrbanSound8K 数据集上分别达到了 96.1% 和 98.1% 的准确率，并且在这两个方面都优于大多数 ESC 任务中的最新方法。准确性和计算负担。

更新日期：2021-09-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>