当前位置: X-MOL 学术Pers. Ubiquitous Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Emotional classification of music using neural networks with the MediaEval dataset
Personal and Ubiquitous Computing ( IF 3.006 ) Pub Date : 2020-04-15 , DOI: 10.1007/s00779-020-01393-4
Yesid Ospitia Medina , José Ramón Beltrán , Sandra Baldassarri

The proven ability of music to transmit emotions provokes the increasing interest in the development of new algorithms for music emotion recognition (MER). In this work, we present an automatic system of emotional classification of music by implementing a neural network. This work is based on a previous implementation of a dimensional emotional prediction system in which a multilayer perceptron (MLP) was trained with the freely available MediaEval database. Although these previous results are good in terms of the metrics of the prediction values, they are not good enough to obtain a classification by quadrant based on the valence and arousal values predicted by the neural network, mainly due to the imbalance between classes in the dataset. To achieve better classification values, a pre-processing phase was implemented to stratify and balance the dataset. Three different classifiers have been compared: linear support vector machine (SVM), random forest, and MLP. The best results are obtained with the MLP. An averaged F-measure of 50% is obtained in a four-quadrant classification schema. Two binary classification approaches are also presented: one vs. rest (OvR) approach in four-quadrants and binary classifier in valence and arousal. The OvR approach has an average F-measure of 69%, and the second one obtained F-measure of 73% and 69% in valence and arousal respectively. Finally, a dynamic classification analysis with different time windows was performed using the temporal annotation data of the MediaEval database. The results obtained show that the classification F-measures in four quadrants are practically constant, regardless of the duration of the time window. Also, this work reflects some limitations related to the characteristics of the dataset, including size, class balance, quality of the annotations, and the sound features available.



中文翻译:

使用带有MediaEval数据集的神经网络对音乐进行情感分类

音乐传递情感的可靠能力激起了人们对开发音乐情感识别(MER)新算法的兴趣。在这项工作中,我们通过实现神经网络提出了一种自动的音乐情感分类系统。这项工作基于维情感预测系统的先前实现,其中使用免费的MediaEval数据库训练了多层感知器(MLP)。尽管这些先前的结果在预测值的度量方面很好,但是它们不足以基于神经网络预测的价和唤醒值按象限进行分类,这主要是由于数据集中各类之间的不平衡。为了获得更好的分类值,实施了预处理阶段以对数据集进行分层和平衡。比较了三种不同的分类器:线性支持向量机(SVM),随机森林和MLP。使用MLP可获得最佳结果。平均在四象限分类方案中获得50%的F度量。还提出了两种二进制分类方法:四象限中的一种vs.休息(OvR)方法以及价和唤醒中的二进制分类器。的OVR方法具有的平均˚F的69%-measure,而第二个获得˚F的73%,在化合价69%-measure和分别觉醒。最后,使用MediaEval数据库的时间注释数据执行了具有不同时间窗口的动态分类分析。获得的结果表明分类F-四个象限中的度量实际上是恒定的,与时间窗口的持续时间无关。此外,这项工作反映了与数据集特征相关的一些限制,包括大小,类平衡,注释的质量和可用的声音功能。

更新日期:2020-04-18
down
wechat
bug