当前位置: X-MOL 学术Pers. Ubiquitous Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-modal emotion prediction system using convergence media and active contents
Personal and Ubiquitous Computing ( IF 3.006 ) Pub Date : 2021-07-26 , DOI: 10.1007/s00779-021-01602-8
Kyungyong Chung 1 , Jin-Su Kim 2
Affiliation  

Multimedia provides a lot of information for users through various forms of information contents and information processing. These days, it is utilized and converged in diverse fields. Particularly, in the convergence of movie or TV drama media and information technologies, how to visualize or predict emotional changes on the basis of a variety of multimedia information has steadily been researched. Based on emotional changes, it is possible to analyze genres of movies or TV dramas. Viewers try to select a movie on the basis of their preferred storylines. In other words, users select and personalize the genres fitting their sentiment in the emotional flow of videos so that they want the customized media reflecting their emotional flow to be recommended. A typical method of predicting emotions shows the high accuracy of emotion prediction when text-based lines are utilized. Nevertheless, when subtle emotions in text lines are presented with voice and video information, it is possible to increase the accuracy of emotion prediction. Therefore, this study proposes a system that is capable of predicting emotional context primarily with the use of a text, and then with the analysis of emotions in the dialogs of characters or their voices in scenes, or images of characters at the point of time. In order to predict emotions efficiently, the proposed system analyzes the time information on a character’s particular emotion words from text data, extracts voice signals of the time section and converts them into a spectrogram, and saves the face image at the point of time. This imaged spectrogram and the face image for facial expression analysis are used as input data of CNN (convolutional neural network) and are trained. An emotion found in a particular paragraph is predicted, and therefore an emotional flow and an emotion in a particular scene are discerned. The developing multi-modal emotion prediction system collects the estimated emotions from convergence media and active contents (text, voice, and image) and active contents such as text, voice, and image. Finally, it predicts emotions.



中文翻译:

使用融合媒体和活动内容的多模态情感预测系统

多媒体通过各种形式的信息内容和信息处理为用户提供大量的信息。如今,它在不同领域被利用和融合。尤其是在影视剧媒体与信息技术的融合中,如何基于多种多媒体信息可视化或预测情感变化的研究一直在稳步推进。基于情绪变化,可以分析电影或电视剧的类型。观众试图根据他们喜欢的故事情节来选择一部电影。换句话说,用户在视频的情感流中选择和个性化适合其情感的流派,以便他们希望推荐反映其情感流的定制媒体。一种典型的情绪预测方法表明,当使用基于文本的行时,情绪预测的准确性很高。然而,当文本行中微妙的情绪与语音和视频信息一起呈现时,可以提高情绪预测的准确性。因此,本研究提出了一种系统,该系统能够主要通过使用文本来预测情感上下文,然后分析角色对话中的情感或场景中的声音,或角色在时间点的图像。为了有效地预测情绪,所提出的系统从文本数据中分析角色特定情绪词的时间信息,提取时间段的语音信号并将其转换为频谱图,并保存该时间点的人脸图像。该成像频谱图和用于面部表情分析的面部图像用作 CNN(卷积神经网络)的输入数据并进行训练。预测在特定段落中发现的情绪,因此辨别特定场景中的情绪流和情绪。开发的多模态情感预测系统从融合媒体和活动内容(文本、语音和图像)和活动内容(如文本、语音和图像)中收集估计的情感。最后,它预测情绪。开发的多模态情感预测系统从融合媒体和活动内容(文本、语音和图像)和活动内容(如文本、语音和图像)中收集估计的情感。最后,它预测情绪。开发的多模态情感预测系统从融合媒体和活动内容(文本、语音和图像)和活动内容(如文本、语音和图像)中收集估计的情感。最后,它预测情绪。

更新日期:2021-07-27
down
wechat
bug