当前位置: X-MOL 学术arXiv.cs.CV › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Recognizing Emotions evoked by Movies using Multitask Learning
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-07-30 , DOI: arxiv-2107.14529
Hassan Hayat, Carles Ventura, Agata Lapedriza

Understanding the emotional impact of movies has become important for affective movie analysis, ranking, and indexing. Methods for recognizing evoked emotions are usually trained on human annotated data. Concretely, viewers watch video clips and have to manually annotate the emotions they experienced while watching the videos. Then, the common practice is to aggregate the different annotations, by computing average scores or majority voting, and train and test models on these aggregated annotations. With this procedure a single aggregated evoked emotion annotation is obtained per each video. However, emotions experienced while watching a video are subjective: different individuals might experience different emotions. In this paper, we model the emotions evoked by videos in a different manner: instead of modeling the aggregated value we jointly model the emotions experienced by each viewer and the aggregated value using a multi-task learning approach. Concretely, we propose two deep learning architectures: a Single-Task (ST) architecture and a Multi-Task (MT) architecture. Our results show that the MT approach can more accurately model each viewer and the aggregated annotation when compared to methods that are directly trained on the aggregated annotations. Furthermore, our approach outperforms the current state-of-the-art results on the COGNIMUSE benchmark.

中文翻译:

使用多任务学习识别电影引起的情绪

了解电影的情感影响对于情感电影分析、排名和索引变得很重要。识别诱发情绪的方法通常是在人类注释数据上进行训练的。具体来说,观众观看视频片段时,必须手动注释他们在观看视频时所经历的情绪。然后,通常的做法是通过计算平均分数或多数投票来聚合不同的注释,并在这些聚合的注释上训练和测试模型。通过此过程,每个视频都获得了单个聚合的诱发情绪注释。然而,观看视频时所体验到的情绪是主观的:不同的人可能会体验到不同的情绪。在本文中,我们以不同的方式对视频引发的情绪进行建模:我们不是对聚合值进行建模,而是使用多任务学习方法联合对每个观看者所经历的情绪和聚合值进行建模。具体来说,我们提出了两种深度学习架构:单任务 (ST) 架构和多任务 (MT) 架构。我们的结果表明,与直接在聚合注释上训练的方法相比,MT 方法可以更准确地对每个查看者和聚合注释进行建模。此外,我们的方法在 COGNIMUSE 基准测试中的表现优于当前最先进的结果。我们的结果表明,与直接在聚合注释上训练的方法相比,MT 方法可以更准确地对每个查看者和聚合注释进行建模。此外,我们的方法在 COGNIMUSE 基准测试中的表现优于当前最先进的结果。我们的结果表明,与直接在聚合注释上训练的方法相比,MT 方法可以更准确地对每个查看者和聚合注释进行建模。此外,我们的方法在 COGNIMUSE 基准测试中的表现优于当前最先进的结果。
更新日期:2021-08-02
down
wechat
bug