Music auto-tagging using scattering transform and convolutional neural network with self-attention,Applied Soft Computing

当前位置： X-MOL 学术 › Appl. Soft Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Music auto-tagging using scattering transform and convolutional neural network with self-attention
Applied Soft Computing ( IF 8.7 ) Pub Date : 2020-09-04 , DOI: 10.1016/j.asoc.2020.106702
Guangxiao Song , Zhijie Wang , Fang Han , Shenyi Ding , Xiaochun Gu

As a branch of machine learning, deep learning has been used for tackling with the music auto-tagging problem. Deep learning methods, especially those with convolutional neural network (CNN) architecture, have exhibited good performance on this multi-label classification task. However, the feature extracting part and preprocessing part of this architecture need to be improved. In this paper, we propose a deep-learning model based on CNN with scattering transform and self-attention mechanism for music automatic tagging. To get a balance between information integrity and feature extraction in the preprocessing phase, we employ the scattering transform. Then, a multi-layer CNN is used to extract higher-level features from the scattering coefficients. In order to select better receptive fields of the CNN, self-attention sub-network is appended at the last layer of CNN. Experimental results on the MagnaTagATune dataset and Million Song Dataset (MSD) show the proposed model is a good choice for music auto-tagging task, since the scores of the area under the receiver operating characteristic curve (ROC-AUC) and the area under the precision–recall curve (PR-AUC) obtained in this paper surpass the state-of-the-art models. Furthermore, we visualize the distributions of attention weights, activations of the CNN and ROC-AUC scores on each tag for better understanding of the model.

中文翻译：

使用自动关注的散射变换和卷积神经网络进行音乐自动标记

作为机器学习的一个分支，深度学习已用于解决音乐自动标记问题。深度学习方法，尤其是具有卷积神经网络（CNN）架构的方法，在此多标签分类任务上表现出良好的性能。但是，该体系结构的特征提取部分和预处理部分需要改进。在本文中，我们提出了一种基于CNN的深度学习模型，该模型具有散射变换和自我关注机制，用于音乐自动标记。为了在预处理阶段在信息完整性和特征提取之间取得平衡，我们采用了散射变换。然后，使用多层CNN从散射系数中提取更高级别的特征。为了选择CNN更好的接收范围，自我注意子网被附加在CNN的最后一层。在MagnaTagATune数据集和Million Song数据集（MSD）上的实验结果表明，该模型是音乐自动标记任务的理想选择，因为接收器工作特性曲线（ROC-AUC）下的区域得分和ROC-AUC下的区域得分较高。本文获得的精确召回曲线（PR-AUC）超过了最新模型。此外，我们可视化注意权重的分布，每个标签上CNN和ROC-AUC分数的激活，以更好地理解模型。因为本文获得的接收器工作特性曲线（ROC-AUC）下的面积分数和精确度-召回曲线（PR-AUC）下的面积分数均超过了最新模型。此外，我们可视化注意权重的分布，每个标签上CNN和ROC-AUC分数的激活，以更好地理解模型。因为本文获得的接收器工作特性曲线（ROC-AUC）下的面积分数和精确度-召回曲线（PR-AUC）下的面积分数均超过了最新模型。此外，我们可视化注意权重的分布，每个标签上CNN和ROC-AUC分数的激活，以更好地理解模型。

更新日期：2020-09-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>