当前位置: X-MOL 学术IEEE Signal Process. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Modeling Label Dependencies for Audio Tagging with Graph Convolutional Network
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 2020-01-01 , DOI: 10.1109/lsp.2020.3019702
Helin Wang , Yuexian Zou , Dading Chong , Wenwu Wang

As a multi-label classification task, audio tagging aims to predict the presence or absence of certain sound events in an audio recording. Existing works in audio tagging do not explicitly consider the probabilities of the co-occurrences between sound events, which is termed as the label dependencies in this study. To address this issue, we propose to model the label dependencies via a graph-based method, where each node of the graph represents a label. An adjacency matrix is constructed by mining the statistical relations between labels to represent the graph structure information, and a graph convolutional network (GCN) is employed to learn node representations by propagating information between neighboring nodes based on the adjacency matrix, which implicitly models the label dependencies. The generated node representations are then applied to the acoustic representations for classification. Experiments on Audioset show that our method achieves a state-of-the-art mean average precision (mAP) of 0.434.

中文翻译:

使用图卷积网络为音频标记建模标签依赖性

作为一项多标签分类任务,音频标记旨在预测音频记录中某些声音事件的存在与否。现有的音频标记工作没有明确考虑声音事件之间共现的概率,这在本研究中被称为标签依赖性。为了解决这个问题,我们建议通过基于图的方法对标签依赖性进行建模,其中图的每个节点代表一个标签。通过挖掘标签之间的统计关系构建邻接矩阵来表示图结构信息,并利用图卷积网络(GCN)通过基于邻接矩阵在相邻节点之间传播信息来学习节点表示,隐式建模标签依赖关系。然后将生成的节点表示应用于声学表示以进行分类。在 Audioset 上的实验表明,我们的方法达到了 0.434 的最先进的平均精度 (mAP)。
更新日期:2020-01-01
down
wechat
bug