Multi-Turn Video Question Generation via Reinforced Multi-Choice Attention Network,IEEE Transactions on Circuits and Systems for Video Technology

当前位置： X-MOL 学术 › IEEE Trans. Circ. Syst. Video Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-Turn Video Question Generation via Reinforced Multi-Choice Attention Network
IEEE Transactions on Circuits and Systems for Video Technology ( IF 8.4 ) Pub Date : 2020-08-06 , DOI: 10.1109/tcsvt.2020.3014775
Zhaoyu Guo , Zhou Zhao , Weike Jin , Zhicheng Wei , Min Yang , Nannan Wang , Nicholas Jing Yuan

Video question generation is a challenging task in visual information retrieval, which generates questions given a sequence of video frames. The existing methods mainly tackle the problem of single-turn video question generation, but single-turn conversation usually can’t meet the needs of video information acquisition. In this paper, we propose a new framework for single-turn VQG, which introduces attention mechanism to process inference of dialog history. And we introduce selection mechanism to choose from the candidate questions generated by each round of dialog history. In the framework, we leverage a recent video question answering model to predict the answer to the generated question and adopt the answer quality as rewards to fine-tune our model based on a reinforced learning mechanism. We also introduce a new task of multi-turn video question generation (M-VQG), which is generating multiple questions based on dialog history and video information to build conversation step by step. Our method achieves the state-of-the-art performance of the single-turn VQG task on two large-scale datasets, YouTube-Clips and TACoS-MultiLevel, and provides a baseline approach for M-VQG task.

中文翻译：

通过增强的多选择注意力网络进行多轮视频问题生成

在视觉信息检索中，视频问题的生成是一项具有挑战性的任务，在给定一系列视频帧的情况下，视频问题会生成问题。现有的方法主要解决单转视频问题的产生，但是单转会话通常不能满足视频信息获取的需求。在本文中，我们提出了一种单轮VQG的新框架，该框架引入了注意力机制来处理对话历史的推理。并且我们引入了选择机制，以从每一轮对话历史记录中生成的候选问题中进行选择。在该框架中，我们利用最新的视频问题回答模型来预测所生成问题的答案，并采用回答质量作为奖励，从而基于强化的学习机制来微调我们的模型。我们还介绍了多转视频问题生成（M-VQG）的新任务，该任务基于对话历史记录和视频信息生成多个问题以逐步建立对话。我们的方法在两个大型数据集YouTube-Clips和TACoS-MultiLevel上实现了单回合VQG任务的最新性能，并为M-VQG任务提供了基线方法。

更新日期：2020-08-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>