Multi-semantic long-range dependencies capturing for efficient video representation learning,Image and Vision Computing

当前位置： X-MOL 学术 › Image Vis. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-semantic long-range dependencies capturing for efficient video representation learning
Image and Vision Computing ( IF 4.7 ) Pub Date : 2020-08-03 , DOI: 10.1016/j.imavis.2020.103988
Jinhao Duan , Hua Xu , Xiaozhu Lin , Shangchao Zhu , Yuanze Du

Capturing long-range dependencies has proven effective on video understanding tasks. However, previous works address this problem in a pixel pairs manner, which might be inaccurate since pixel pairs contain too limited semantic information. Besides, considerable computations and parameters will be introduced in those methods. Following the pattern of features aggregation in Graph Convolutional Networks (GCNs), we aggregate pixels with their neighbors into semantic units, which contain stronger semantic information than pixel pairs. We designed an efficient, parameter-free, semantic units-based dependencies capturing framework, named as Multi-semantic Long-range Dependencies Capturing (MLDC) block. We verified our methods on large-scale challenging video classification benchmark, such as Kinetics. Experiments demonstrate that our method highly outperforms pixel pairs-based methods and achieves the state-of-the-art performance, without introducing any parameters and much computations.

中文翻译：

捕获多语义远程依存关系，以进行有效的视频表示学习

事实证明，捕获远程依存关系对视频理解任务有效。但是，先前的工作以像素对的方式解决了这个问题，由于像素对包含的语义信息太有限，这可能是不准确的。此外，在那些方法中将引入大量的计算和参数。遵循图卷积网络（GCN）中的特征聚合模式，我们将像素及其相邻像素聚合到语义单元中，该语义单元比像素对包含更强的语义信息。我们设计了一个高效的，无参数的，基于语义单元的依存关系捕获框架，称为多语义远程依存关系捕获（MLDC）块。我们在大规模的具有挑战性的视频分类基准（例如Kinetics）上验证了我们的方法。

更新日期：2020-08-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>