当前位置: X-MOL 学术Int. J. Pattern Recognit. Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MRN: Moment Relation Network for Natural Language Video Localization with Transfer Learning
International Journal of Pattern Recognition and Artificial Intelligence ( IF 1.5 ) Pub Date : 2021-02-06 , DOI: 10.1142/s0218001421520091
Siyu Jiang 1 , Guobin Wu 2
Affiliation  

In this paper, we tackle the task of natural language video localization (NLVL): given an untrimmed video and a description language query, the goal is to localize the temporal segment within the video that best describes the natural language description. NLVL is challenging at the intersection of language and video understanding because a video may contain multiple segments of interests and the language may describe complicated temporal dependencies. Though existing approaches have achieved good performance, most of them did not fully consider the inherent differences between language and video modalities. Here, we propose Moment Relation Network (MRN) to reduce the divergence of the probability distribution of these two modalities. Specifically, MRN trains video and language subnets, and then uses transfer learning techniques to map the extracted features into an embedding-shared space where we calculate the similarity of two modalities using Mahalanobis distance metric, which is used to localize moments. Extensive experiments on benchmark datasets show that the proposed MRN significantly outperforms the state-of-the-art under the widely used metrics by a large margin.

中文翻译:

MRN:具有迁移学习的自然语言视频定位的矩关系网络

在本文中,我们解决了自然语言视频本地化(NLVL)的任务:给定一个未修剪的视频和一个描述语言查询,目标是定位视频中最能描述自然语言描述的时间片段。NLVL 在语言和视频理解的交叉点上具有挑战性,因为视频可能包含多个兴趣片段,并且语言可能描述复杂的时间依赖性。尽管现有方法已经取得了良好的性能,但它们中的大多数都没有充分考虑语言和视频模态之间的内在差异。在这里,我们提出矩关系网络(MRN)来减少这两种模式的概率分布的分歧。具体来说,MRN 训练视频和语言子网,然后使用迁移学习技术将提取的特征映射到嵌入共享空间中,在该空间中,我们使用马氏距离度量来计算两种模态的相似度,用于定位矩。在基准数据集上进行的大量实验表明,在广泛使用的指标下,所提出的 MRN 显着优于最先进的技术。
更新日期:2021-02-06
down
wechat
bug