当前位置: X-MOL 学术Appl. Soft Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Music auto-tagging using scattering transform and convolutional neural network with self-attention
Applied Soft Computing ( IF 8.7 ) Pub Date : 2020-09-04 , DOI: 10.1016/j.asoc.2020.106702
Guangxiao Song , Zhijie Wang , Fang Han , Shenyi Ding , Xiaochun Gu

As a branch of machine learning, deep learning has been used for tackling with the music auto-tagging problem. Deep learning methods, especially those with convolutional neural network (CNN) architecture, have exhibited good performance on this multi-label classification task. However, the feature extracting part and preprocessing part of this architecture need to be improved. In this paper, we propose a deep-learning model based on CNN with scattering transform and self-attention mechanism for music automatic tagging. To get a balance between information integrity and feature extraction in the preprocessing phase, we employ the scattering transform. Then, a multi-layer CNN is used to extract higher-level features from the scattering coefficients. In order to select better receptive fields of the CNN, self-attention sub-network is appended at the last layer of CNN. Experimental results on the MagnaTagATune dataset and Million Song Dataset (MSD) show the proposed model is a good choice for music auto-tagging task, since the scores of the area under the receiver operating characteristic curve (ROC-AUC) and the area under the precision–recall curve (PR-AUC) obtained in this paper surpass the state-of-the-art models. Furthermore, we visualize the distributions of attention weights, activations of the CNN and ROC-AUC scores on each tag for better understanding of the model.



中文翻译:

使用自动关注的散射变换和卷积神经网络进行音乐自动标记

作为机器学习的一个分支,深度学习已用于解决音乐自动标记问题。深度学习方法,尤其是具有卷积神经网络(CNN)架构的方法,在此多标签分类任务上表现出良好的性能。但是,该体系结构的特征提取部分和预处理部分需要改进。在本文中,我们提出了一种基于CNN的深度学习模型,该模型具有散射变换和自我关注机制,用于音乐自动标记。为了在预处理阶段在信息完整性和特征提取之间取得平衡,我们采用了散射变换。然后,使用多层CNN从散射系数中提取更高级别的特征。为了选择CNN更好的接收范围,自我注意子网被附加在CNN的最后一层。在MagnaTagATune数据集和Million Song数据集(MSD)上的实验结果表明,该模型是音乐自动标记任务的理想选择,因为接收器工作特性曲线(ROC-AUC)下的区域得分和ROC-AUC下的区域得分较高。本文获得的精确召回曲线(PR-AUC)超过了最新模型。此外,我们可视化注意权重的分布,每个标签上CNN和ROC-AUC分数的激活,以更好地理解模型。因为本文获得的接收器工作特性曲线(ROC-AUC)下的面积分数和精确度-召回曲线(PR-AUC)下的面积分数均超过了最新模型。此外,我们可视化注意权重的分布,每个标签上CNN和ROC-AUC分数的激活,以更好地理解模型。因为本文获得的接收器工作特性曲线(ROC-AUC)下的面积分数和精确度-召回曲线(PR-AUC)下的面积分数均超过了最新模型。此外,我们可视化注意权重的分布,每个标签上CNN和ROC-AUC分数的激活,以更好地理解模型。

更新日期:2020-09-04
down
wechat
bug