Movie Question Answering via Textual Memory and Plot Graph,IEEE Transactions on Circuits and Systems for Video Technology

当前位置： X-MOL 学术 › IEEE Trans. Circ. Syst. Video Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Movie Question Answering via Textual Memory and Plot Graph
IEEE Transactions on Circuits and Systems for Video Technology ( IF 8.4 ) Pub Date : 2020-03-01 , DOI: 10.1109/tcsvt.2019.2897604
Yahong Han , Bo Wang , Richang Hong , Fei Wu

Movies provide us with a mass of visual content as well as attracting stories. Existing methods have illustrated that understanding movie stories through only visual content is still a hard problem. In this paper, for answering questions about movies, we introduce a new dataset called PlotGraphs, as external knowledge. The dataset contains massive graph-based information of movies. In addition, we put forward a model that can utilize movie clip, subtitle, and graph-based external knowledge. The model contains two main parts: a layered memory network (LMN) and a plot graph representation network (PGRN). In particular, the LMN can represent frame-level and clip-level movie content by the fixed word memory module and the adaptive subtitle memory module, respectively. And the plot graph representation network can represent the entire graph. We first extract words and sentences from the training movie subtitles and then the hierarchically formed movie representations, which are learned from LMN. At the same time, the PGRN can represent the semantic information and the relationships in the graph. We conduct extensive experiments on the MovieQA dataset and the PlotGraphs dataset. With only visual content as inputs, the LMN with frame-level representation obtains a large performance improvement. When incorporating subtitles into LMN to form the clip-level representation, we achieve the state-of-the-art performance on the online evaluation task of “Video+Subtitles.” After the integration of external knowledge, the performance of the model consisting of the LMN and the PGRN is further improved. The good performance successfully demonstrates that the external knowledge and the proposed model are effective for movie understanding.

中文翻译：

通过文本记忆和情节图的电影问答

电影为我们提供了大量的视觉内容和引人入胜的故事。现有方法表明，仅通过视觉内容理解电影故事仍然是一个难题。在本文中，为了回答有关电影的问题，我们引入了一个名为 PlotGraphs 的新数据集作为外部知识。该数据集包含大量基于图形的电影信息。此外，我们提出了一个可以利用电影剪辑、字幕和基于图形的外部知识的模型。该模型包含两个主要部分：分层记忆网络（LMN）和绘图表示网络（PGRN）。特别是，LMN可以分别通过固定字记忆模块和自适应字幕记忆模块来表示帧级和剪辑级电影内容。而 plot 图表示网络可以表示整个图。我们首先从训练电影字幕中提取单词和句子，然后从 LMN 学习到分层形成的电影表示。同时，PGRN 可以表示图中的语义信息和关系。我们对 MovieQA 数据集和 PlotGraphs 数据集进行了大量实验。仅以视觉内容作为输入，具有帧级表示的 LMN 获得了很大的性能提升。当将字幕合并到 LMN 以形成剪辑级表示时，我们在“视频+字幕”的在线评估任务上实现了最先进的性能。在整合外部知识后，由LMN和PGRN组成的模型的性能得到进一步提升。

更新日期：2020-03-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>