当前位置: X-MOL 学术IEEE Trans. Image Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Graph-Based Multi-Interaction Network for Video Question Answering
IEEE Transactions on Image Processing ( IF 10.6 ) Pub Date : 2021-01-21 , DOI: 10.1109/tip.2021.3051756
Mao Gu , Zhou Zhao , Weike Jin , Richang Hong , Fei Wu

Video question answering is an important task combining both Natural Language Processing and Computer Vision, which requires a machine to obtain a thorough understanding of the video. Most existing approaches simply capture spatio-temporal information in videos by using a combination of recurrent and convolutional neural networks. Nonetheless, most previous work focus on only salient frames or regions, which normally lacks some significant details, such as potential location and action relations. In this paper, we propose a new method called Graph-based Multi-interaction Network for video question answering. In our model, a new attention mechanism named multi-interaction is designed to capture both element-wise and segment-wise sequence interactions simultaneously, which can be found between and inside the multi-modal inputs. Moreover, we propose a graph-based relation-aware neural network to explore a more fine-grained visual representation, which could explore the relationships and dependencies between objects spatially and temporally. We evaluate our method on TGIF-QA and other two video QA datasets. The qualitative and quantitative experimental results show the effectiveness of our model, which achieves state-of-the-art performance.

中文翻译:

基于图的视频交互多交互网络

视频问答是将自然语言处理和计算机视觉相结合的一项重要任务,需要一台机器来全面了解视频。大多数现有的方法只是通过结合使用循环神经网络和卷积神经网络来捕获视频中的时空信息。尽管如此,大多数以前的工作只集中在显着的框架或区域,而这些框架或区域通常缺少一些重要的细节,例如潜在的位置和动作关系。在本文中,我们提出了一种新的方法,称为基于图的多交互网络,用于视频问答。在我们的模型中,设计了一种名为“多重交互”的新注意力机制,以同时捕获元素方式和分段方式的序列交互,这可以在多模式输入之间和内部找到。而且,我们提出了一种基于图的关系感知神经网络,以探索更细粒度的视觉表示,该视觉表示可以在空间和时间上探索对象之间的关系和依存关系。我们在TGIF-QA和其他两个视频质量检查数据集上评估了我们的方法。定性和定量的实验结果证明了我们模型的有效性,该模型达到了最先进的性能。
更新日期:2021-02-16
down
wechat
bug