A Coarse-to-Fine Framework for Resource Efficient Video Recognition,International Journal of Computer Vision

当前位置： X-MOL 学术 › Int. J. Comput. Vis. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Coarse-to-Fine Framework for Resource Efficient Video Recognition
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2021-08-18 , DOI: 10.1007/s11263-021-01508-1
Zuxuan Wu ₁ , Yu-Gang Jiang ₁ , Hengduo Li ₂ , Larry S Davis ₂ , Yingbin Zheng ₃ , Caiming Xiong ₄

Affiliation

Deep neural networks have demonstrated remarkable recognition results on video classification, however great improvements in accuracies come at the expense of large amounts of computational resources. In this paper, we introduce LiteEval for resource efficient video recognition. LiteEval is a coarse-to-fine framework that dynamically allocates computation on a per-video basis, and can be deployed in both online and offline settings. Operating by default on low-cost features that are computed with images at a coarse scale, LiteEval adaptively determines on-the-fly when to read in more discriminative yet computationally expensive features. This is achieved by the interactions of a coarse RNN and a fine RNN, together with a conditional gating module that automatically learns when to use more computation conditioned on incoming frames. We conduct extensive experiments on three large-scale video benchmarks, FCVID, ActivityNet and Kinetics, and demonstrate, among other things, that LiteEval offers impressive recognition performance while using significantly less computation for both online and offline settings.

中文翻译：

资源高效视频识别的粗到细框架

深度神经网络在视频分类方面表现出显着的识别结果，但是准确率的巨大提高是以牺牲大量计算资源为代价的。在本文中，我们介绍了用于资源高效视频识别的 LiteEval。LiteEval 是一个从粗到细的框架，可以在每个视频的基础上动态分配计算，并且可以部署在在线和离线设置中。默认情况下，LiteEval 在使用粗尺度图像计算的低成本特征上运行，自适应地确定何时读取更具辨别力但计算成本高的特征。这是通过粗略 RNN 和精细 RNN 的交互以及条件门控模块来实现的，该模块自动学习何时使用以传入帧为条件的更多计算。

更新日期：2021-08-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>