Weakly-Supervised Video Moment Retrieval via Semantic Completion Network,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Weakly-Supervised Video Moment Retrieval via Semantic Completion Network
arXiv - CS - Multimedia Pub Date : 2019-11-19 , DOI: arxiv-1911.08199
Zhijie Lin, Zhou Zhao, Zhu Zhang, Qi Wang and Huasheng Liu

Video moment retrieval is to search the moment that is most relevant to the given natural language query. Existing methods are mostly trained in a fully-supervised setting, which requires the full annotations of temporal boundary for each query. However, manually labeling the annotations is actually time-consuming and expensive. In this paper, we propose a novel weakly-supervised moment retrieval framework requiring only coarse video-level annotations for training. Specifically, we devise a proposal generation module that aggregates the context information to generate and score all candidate proposals in one single pass. We then devise an algorithm that considers both exploitation and exploration to select top-K proposals. Next, we build a semantic completion module to measure the semantic similarity between the selected proposals and query, compute reward and provide feedbacks to the proposal generation module for scoring refinement. Experiments on the ActivityCaptions and Charades-STA demonstrate the effectiveness of our proposed method.

中文翻译：

基于语义完成网络的弱监督视频时刻检索

视频时刻检索是搜索与给定自然语言查询最相关的时刻。现有方法大多在完全监督的环境中训练，这需要每个查询的时间边界的完整注释。然而，手动标记注释实际上既耗时又昂贵。在本文中，我们提出了一种新的弱监督时刻检索框架，只需要粗略的视频级注释进行训练。具体来说，我们设计了一个提案生成模块，该模块聚合上下文信息以在一次通过中生成所有候选提案并对其进行评分。然后，我们设计了一种算法，该算法同时考虑了利用和探索来选择 top-K 提议。接下来，我们构建了一个语义完成模块来衡量所选提议和查询之间的语义相似度，计算奖励并向提案生成模块提供反馈以进行评分细化。ActivityCaptions 和 Charades-STA 上的实验证明了我们提出的方法的有效性。

更新日期：2020-01-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>