当前位置: X-MOL 学术Pattern Recogn. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Movie fill in the blank by joint learning from video and text with adaptive temporal attention
Pattern Recognition Letters ( IF 5.1 ) Pub Date : 2018-06-30 , DOI: 10.1016/j.patrec.2018.06.030
Jie Chen , Jie Shao , Chengkun He

Video understanding is a challenging problem and it attracts a lot of research attention. Lately, a new task called movie fill in the blank (MovieFIB) is proposed. In this task, given a movie clip and a description which has one blank, we need to predict the word in the blank accurately. Previous studies make many contributions to tackling this problem. However, some of them do not utilize the relationship between words and video frames, and some others treat visual information as essential elements for blank word prediction, which fail to distinguish the effects of texts before and after the blank. To overcome the limitations, in this paper we propose to use adaptive temporal attention and fuse text information with attention. We first extract video and word features. Then, adaptive temporal attention is used to update original description. For the updated description, we extract its text information. Attention mechanism is applied to fuse text information. Finally, we use adaptive temporal attention to predict the blank word. Extensive experiments demonstrate that our model achieves satisfactory performance.



中文翻译:

通过对视频和文本进行自适应学习,共同学习,电影填补了空白

视频理解是一个具有挑战性的问题,它引起了很多研究关注。最近,提出了一个新的任务,称为电影填充空白(MovieFIB)。在此任务中,给定一个影片剪辑和一个包含一个空白的说明,我们需要准确地预测空白中的单词。先前的研究为解决这个问题做出了许多贡献。但是,其中一些没有利用单词和视频帧之间的关系,而另一些则将视觉信息视为空白单词预测的必要元素,这无法区分空白前后的文本效果。为了克服这些限制,在本文中,我们建议使用自适应时间注意,并将文本信息与注意融合。我们首先提取视频和文字特征。然后,使用自适应时间注意力来更新原始描述。对于更新的描述,我们提取其文本信息。注意机制应用于融合文本信息。最后,我们使用自适应时间注意力来预测空白词。大量实验表明,我们的模型取得了令人满意的性能。

更新日期:2020-03-20
down
wechat
bug