当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning Frame Level Attention for Environmental Sound Classification
arXiv - CS - Sound Pub Date : 2020-07-12 , DOI: arxiv-2007.07241
Zhichao Zhang and Shugong Xu and Shunqing Zhang and Tianhao Qiao and Shan Cao

Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. The classification performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. However, ESC often suffers from the semantically irrelevant frames and silent frames. In order to deal with this, we employ a frame-level attention model to focus on the semantically relevant frames and salient frames. Specifically, we first propose a convolutional recurrent neural network to learn spectro-temporal features and temporal correlations. Then, we extend our convolutional RNN model with a frame-level attention mechanism to learn discriminative feature representations for ESC. We investigated the classification performance when using different attention scaling function and applying different layers. Experiments were conducted on ESC-50 and ESC-10 datasets. Experimental results demonstrated the effectiveness of the proposed method and our method achieved the state-of-the-art or competitive classification accuracy with lower computational complexity. We also visualized our attention results and observed that the proposed attention mechanism was able to lead the network tofocus on the semantically relevant parts of environmental sounds.

中文翻译:

环境声音分类的学习框架级注意力

由于声音的复杂性,环境声音分类 (ESC) 是一个具有挑战性的问题。分类性能在很大程度上取决于从环境声音中提取的代表性特征的有效性。然而,ESC 经常受到语义不相关的帧和无声帧的影响。为了解决这个问题,我们采用了帧级注意力模型来关注语义相关的帧和显着的帧。具体来说,我们首先提出了一个卷积循环神经网络来学习光谱时间特征和时间相关性。然后,我们使用帧级注意力机制扩展我们的卷积 RNN 模型,以学习 ESC 的判别特征表示。我们研究了使用不同注意力缩放函数和应用不同层时的分类性能。在 ESC-50 和 ESC-10 数据集上进行了实验。实验结果证明了所提出方法的有效性,我们的方法以较低的计算复杂度实现了最先进的或有竞争力的分类精度。我们还可视化了我们的注意力结果,并观察到所提出的注意力机制能够引导网络关注环境声音的语义相关部分。
更新日期:2020-07-15
down
wechat
bug