Multimodal Local-Global Attention Network for Affective Video Content Analysis,IEEE Transactions on Circuits and Systems for Video Technology

当前位置： X-MOL 学术 › IEEE Trans. Circ. Syst. Video Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multimodal Local-Global Attention Network for Affective Video Content Analysis
IEEE Transactions on Circuits and Systems for Video Technology ( IF 8.4 ) Pub Date : 2020-08-07 , DOI: 10.1109/tcsvt.2020.3014889
Yangjun Ou , Zhenzhong Chen , Feng Wu

With the rapid development of video distribution and broadcasting, affective video content analysis has attracted a lot of research and development activities recently. Predicting emotional responses of movie audiences is a challenging task in affective computing, since the induced emotions can be considered relatively subjective. In this article, we propose a multimodal local-global attention network (MMLGAN) for affective video content analysis. Inspired by the multimodal integration effect, we extend the attention mechanism to multi-level fusion and design a multimodal fusion unit to obtain a global representation of affective video. The multimodal fusion unit selects key parts from multimodal local streams in the local attention stage and captures the information distribution across time in the global attention stage. Experiments on the LIRIS-ACCEDE dataset, the MediaEval 2015 and 2016 datasets, the FilmStim dataset, the DEAP dataset and the VideoEmotion dataset demonstrate the effectiveness of our approach when compared with the state-of-the-art methods.

中文翻译：

多模态局部全球注意力网络，用于情感视频内容分析

随着视频分发和广播的迅速发展，情感视频内容分析近来吸引了许多研究和开发活动。在情感计算中，预测电影观众的情感反应是一项艰巨的任务，因为可以将诱发的情感视为相对主观的。在本文中，我们提出了一种用于情感视频内容分析的多模式局部-全球注意力网络（MMLGAN）。受多模式整合效果的启发，我们将注意力机制扩展到多级融合，并设计了多模式融合单元以获得情感视频的全局表示。多模式融合单元在本地关注阶段从多模式本地流中选择关键部分，并在全局关注阶段捕获跨时间的信息分布。

更新日期：2020-08-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>