当前位置: X-MOL 学术Math. Probl. Eng. › 论文详情
A Multimodal Music Emotion Classification Method Based on Multifeature Combined Network Classifier
Mathematical Problems in Engineering ( IF 1.009 ) Pub Date : 2020-08-01 , DOI: 10.1155/2020/4606027
Changfeng Chen; Qiang Li

Aiming at the shortcomings of single network classification model, this paper applies CNN-LSTM (convolutional neural networks-long short-term memory) combined network in the field of music emotion classification and proposes a multifeature combined network classifier based on CNN-LSTM which combines 2D (two-dimensional) feature input through CNN-LSTM and 1D (single-dimensional) feature input through DNN (deep neural networks) to make up for the deficiencies of original single feature models. The model uses multiple convolution kernels in CNN for 2D feature extraction, BiLSTM (bidirectional LSTM) for serialization processing and is used, respectively, for audio and lyrics single-modal emotion classification output. In the audio feature extraction, music audio is finely divided and the human voice is separated to obtain pure background sound clips; the spectrogram and LLDs (Low Level Descriptors) are extracted therefrom. In the lyrics feature extraction, the chi-squared test vector and word embedding extracted by Word2vec are, respectively, used as the feature representation of the lyrics. Combining the two types of heterogeneous features selected by audio and lyrics through the classification model can improve the classification performance. In order to fuse the emotional information of the two modals of music audio and lyrics, this paper proposes a multimodal ensemble learning method based on stacking, which is different from existing feature-level and decision-level fusion methods, the method avoids information loss caused by direct dimensionality reduction, and the original features are converted into label results for fusion, effectively solving the problem of feature heterogeneity. Experiments on million song dataset show that the audio classification accuracy of the multifeature combined network classifier in this paper reaches 68%, and the lyrics classification accuracy reaches 74%. The average classification accuracy of the multimodal reaches 78%, which is significantly improved compared with the single-modal.
更新日期:2020-08-01

 

全部期刊列表>>
施普林格自然
欢迎访问IOP中国网站
GIANT
自然职场线上招聘会
ACS ES&T Engineering
ACS ES&T Water
屿渡论文,编辑服务
何川
苏昭铭
陈刚
姜涛
李闯创
复旦大学
刘立明
隐藏1h前已浏览文章
课题组网站
新版X-MOL期刊搜索和高级搜索功能介绍
ACS材料视界
天合科研
x-mol收录
上海纽约大学
曾林
天津大学
何振宇
史大永
吉林大学
卓春祥
张昊
刘冬生
试剂库存
down
wechat
bug