当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Environmental Sound Classification with Parallel Temporal-spectral Attention
arXiv - CS - Sound Pub Date : 2019-12-14 , DOI: arxiv-1912.06808
Helin Wang, Yuexian Zou, Dading Chong, Wenwu Wang

Convolutional neural networks (CNN) are one of the best-performing neural network architectures for environmental sound classification (ESC). Recently, temporal attention mechanisms have been used in CNN to capture the useful information from the relevant time frames for audio classification, especially for weakly labelled data where the onset and offset times of the sound events are not applied. In these methods, however, the inherent spectral characteristics and variations are not explicitly exploited when obtaining the deep features. In this paper, we propose a novel parallel temporal-spectral attention mechanism for CNN to learn discriminative sound representations, which enhances the temporal and spectral features by capturing the importance of different time frames and frequency bands. Parallel branches are constructed to allow temporal attention and spectral attention to be applied respectively in order to mitigate interference from the segments without the presence of sound events. The experiments on three environmental sound classification (ESC) datasets and two acoustic scene classification (ASC) datasets show that our method improves the classification performance and also exhibits robustness to noise.

中文翻译:

具有并行时谱注意的环境声音分类

卷积神经网络 (CNN) 是性能最佳的环境声音分类 (ESC) 神经网络架构之一。最近,CNN 中使用了时间注意机制来从音频分类的相关时间帧中捕获有用的信息,特别是对于不应用声音事件的开始和偏移时间的弱标记数据。然而,在这些方法中,在获得深层特征时没有明确利用固有的光谱特征和变化。在本文中,我们为 CNN 提出了一种新的并行时间-频谱注意机制来学习有区别的声音表示,它通过捕获不同时间帧和频带的重要性来增强时间和频谱特征。构建并行分支以允许分别应用时间注意和频谱注意,以便在不存在声音事件的情况下减轻来自片段的干扰。在三个环境声音分类 (ESC) 数据集和两个声场景分类 (ASC) 数据集上的实验表明,我们的方法提高了分类性能并且还表现出对噪声的鲁棒性。
更新日期:2020-05-22
down
wechat
bug