Music emotion recognition using convolutional long short term memory deep neural networks,Engineering Science and Technology, an International Journal

当前位置： X-MOL 学术 › Eng. Sci. Technol. Int. J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Music emotion recognition using convolutional long short term memory deep neural networks
Engineering Science and Technology, an International Journal ( IF 5.1 ) Pub Date : 2020-11-01 , DOI: 10.1016/j.jestch.2020.10.009
Serhat Hizlisoy , Serdar Yildirim , Zekeriya Tufekci

Abstract In this paper, we propose an approach for music emotion recognition based on convolutional long short term memory deep neural network (CLDNN) architecture. In addition, we construct a new Turkish emotional music database composed of 124 Turkish traditional music excerpts with a duration of 30 s each and the performance of the proposed approach is evaluated on the constructed database. We utilize features obtained by feeding convolutional neural network (CNN) layers with log-mel filterbank energies and mel frequency cepstral coefficients (MFCCs) in addition to standard acoustic features. Classification results show that the best performance is obtained when the new feature set is combined with the standard features using the long short term memory (LSTM) + deep neural network (DNN) classi fier. The overall accuracy of 99.19% is obtained using the proposed system with 10 fold cross-validation. Specifically, 6.45 points improvement is achieved. Additionally, the results also show that the LSTM + DNN classifier yields 1.61, 1.61 and 3.23 points improvements in music emotion recognition accuracies compared to k-nearest neighbor (k-NN), support vector machine (SVM), and Random Forest classifiers, respectively.

中文翻译：

基于卷积长短期记忆深度神经网络的音乐情感识别

摘要在本文中，我们提出了一种基于卷积长短期记忆深度神经网络（CLDNN）架构的音乐情感识别方法。此外，我们构建了一个新的土耳其情感音乐数据库，由 124 首土耳其传统音乐片段组成，每个片段的持续时间为 30 秒，并在构建的数据库上评估所提出方法的性能。除了标准声学特征外，我们还利用通过向卷积神经网络 (CNN) 层提供对数梅尔滤波器组能量和梅尔频率倒谱系数 (MFCC) 获得的特征。分类结果表明，当使用长短期记忆（LSTM）+深度神经网络（DNN）分类器将新特征集与标准特征结合时，可获得最佳性能。整体准确率99。19% 是使用具有 10 倍交叉验证的建议系统获得的。具体而言，实现了 6.45 点的改进。此外，结果还表明，与 k-最近邻 (k-NN)、支持向量机 (SVM) 和随机森林分类器相比，LSTM + DNN 分类器在音乐情感识别精度方面分别提高了 1.61、1.61 和 3.23 分.

更新日期：2020-11-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文