当前位置: X-MOL 学术arXiv.cs.MM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparison and Analysis of Deep Audio Embeddings for Music Emotion Recognition
arXiv - CS - Multimedia Pub Date : 2021-04-13 , DOI: arxiv-2104.06517
Eunjeong Koh, Shlomo Dubnov

Emotion is a complicated notion present in music that is hard to capture even with fine-tuned feature engineering. In this paper, we investigate the utility of state-of-the-art pre-trained deep audio embedding methods to be used in the Music Emotion Recognition (MER) task. Deep audio embedding methods allow us to efficiently capture the high dimensional features into a compact representation. We implement several multi-class classifiers with deep audio embeddings to predict emotion semantics in music. We investigate the effectiveness of L3-Net and VGGish deep audio embedding methods for music emotion inference over four music datasets. The experiments with several classifiers on the task show that the deep audio embedding solutions can improve the performances of the previous baseline MER models. We conclude that deep audio embeddings represent musical emotion semantics for the MER task without expert human engineering.

中文翻译:

用于音乐情感识别的深层音频嵌入的比较和分析

情感是音乐中存在的一个复杂概念,即使使用微调的功能工程也很难捕捉到。在本文中,我们研究了用于音乐情感识别(MER)任务的先进的预训练深度音频嵌入方法的实用性。深度音频嵌入方法使我们能够有效地将高维特征捕获为紧凑的表示形式。我们实现了带有深层音频嵌入的多个多类分类器,以预测音乐中的情感语义。我们研究了L3-Net和VGGish深度音频嵌入方法对四个音乐数据集的音乐情感推理的有效性。在任务上使用多个分类器进行的实验表明,深度音频嵌入解决方案可以提高以前的基准MER模型的性能。
更新日期:2021-04-15
down
wechat
bug