当前位置: X-MOL 学术Pers. Ubiquitous Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Supervised machine learning for audio emotion recognition
Personal and Ubiquitous Computing Pub Date : 2020-04-22 , DOI: 10.1007/s00779-020-01389-0
Stuart Cunningham , Harrison Ridley , Jonathan Weinel , Richard Picking

The field of Music Emotion Recognition has become and established research sub-domain of Music Information Retrieval. Less attention has been directed towards the counterpart domain of Audio Emotion Recognition, which focuses upon detection of emotional stimuli resulting from non-musical sound. By better understanding how sounds provoke emotional responses in an audience, it may be possible to enhance the work of sound designers. The work in this paper uses the International Affective Digital Sounds set. A total of 76 features are extracted from the sounds, spanning the time and frequency domains. The features are then subjected to an initial analysis to determine what level of similarity exists between pairs of features measured using Pearson’s r correlation coefficient before being used as inputs to a multiple regression model to determine their weighting and relative importance. The features are then used as the input to two machine learning approaches: regression modelling and artificial neural networks in order to determine their ability to predict the emotional dimensions of arousal and valence. It was found that a small number of strong correlations exist between the features and that a greater number of features contribute significantly to the predictive power of emotional valence, rather than arousal. Shallow neural networks perform significantly better than a range of regression models and the best performing networks were able to account for 64.4% of the variance in prediction of arousal and 65.4% in the case of valence. These findings are a major improvement over those encountered in the literature. Several extensions of this research are discussed, including work related to improving data sets as well as the modelling processes.



中文翻译:

监督机器学习以进行音频情感识别

音乐情感识别领域已经成为并建立了音乐信息检索研究的子领域。较少的注意力集中在音频情感识别的对应领域,该领域专注于检测由非音乐声音引起的情感刺激。通过更好地了解声音如何引起听众的情感反应,有可能增强声音设计师的工作。本文的工作使用了国际情感数字声音集。从声音中提取了总共76个特征,涵盖了时域和频域。然后对特征进行初始分析,以确定使用Pearson's r测量的成对特征之间存在相似程度相关系数,然后用作多元回归模型的输入以确定其权重和相对重要性。然后将这些功能用作两种机器学习方法的输入:回归建模和人工神经网络,以确定它们预测唤醒和化合情感尺寸的能力。发现在特征之间存在少量的强相关性,并且大量的特征显着地促进了情绪价的预测能力,而不是引起唤醒。浅层神经网络的性能明显优于一系列回归模型,而性能最好的网络能够在唤醒预测中占64.4%的方差,在化合价情况下能够占65.4%。这些发现是对文献中发现的重大改进。讨论了这项研究的几个扩展,包括与改进数据集以及建模过程有关的工作。

更新日期:2020-04-23
down
wechat
bug