当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Acoustic Scene Classification with Spectrogram Processing Strategies
arXiv - CS - Sound Pub Date : 2020-07-06 , DOI: arxiv-2007.03781
Helin Wang, Yuexian Zou, Dading Chong

Recently, convolutional neural networks (CNN) have achieved the state-of-the-art performance in acoustic scene classification (ASC) task. The audio data is often transformed into two-dimensional spectrogram representations, which are then fed to the neural networks. In this paper, we study the problem of efficiently taking advantage of different spectrogram representations through discriminative processing strategies. There are two main contributions. The first contribution is exploring the impact of the combination of multiple spectrogram representations at different stages, which provides a meaningful reference for the effective spectrogram fusion. The second contribution is that the processing strategies in multiple frequency bands and multiple temporal frames are proposed to make fully use of a single spectrogram representation. The proposed spectrogram processing strategies can be easily transferred to any network structures. The experiments are carried out on the DCASE 2020 Task1 datasets, and the results show that our method could achieve the accuracy of 81.8% (official baseline: 54.1%) and 92.1% (official baseline: 87.3%) on the officially provided fold 1 evaluation dataset of Task1A and Task1B, respectively.

中文翻译:

使用频谱图处理策略的声学场景分类

最近,卷积神经网络 (CNN) 在声学场景分类 (ASC) 任务中取得了最先进的性能。音频数据通常被转换为二维频谱图表示,然后被馈送到神经网络。在本文中,我们研究了通过判别处理策略有效利用不同频谱图表示的问题。有两个主要贡献。第一个贡献是探索了不同阶段多个频谱图表示组合的影响,这为有效的频谱图融合提供了有意义的参考。第二个贡献是提出了多频段和多时间帧的处理策略,以充分利用单个频谱图表示。提出的频谱图处理策略可以轻松转移到任何网络结构。实验在DCASE 2020 Task1数据集上进行,结果表明我们的方法在官方提供的fold 1评估上可以达到81.8%(官方基线:54.1%)和92.1%(官方基线:87.3%)的准确率分别为 Task1A 和 Task1B 的数据集。
更新日期:2020-07-09
down
wechat
bug