当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Action Recognition with Dynamic Image Networks
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2017-11-02 , DOI: 10.1109/tpami.2017.2769085
Hakan Bilen , Basura Fernando , Efstratios Gavves , Andrea Vedaldi

We introduce the concept of dynamic image , a novel compact representation of videos useful for video analysis, particularly in combination with convolutional neural networks (CNNs). A dynamic image encodes temporal data such as RGB or optical flow videos by using the concept of ‘rank pooling’. The idea is to learn a ranking machine that captures the temporal evolution of the data and to use the parameters of the latter as a representation. We call the resulting representation dynamic image because it summarizes the video dynamics in addition to appearance. This powerful idea allows to convert any video to an image so that existing CNN models pre-trained with still images can be immediately extended to videos. We also present an efficient approximate rank pooling operator that runs two orders of magnitude faster than the standard ones with any loss in ranking performance and can be formulated as a CNN layer. To demonstrate the power of the representation, we introduce a novel four stream CNN architecture which can learn from RGB and optical flow frames as well as from their dynamic image representations. We show that the proposed network achieves state-of-the-art performance, 95.5 and 72.5 percent accuracy, in the UCF101 and HMDB51, respectively.

中文翻译:

动态图像网络的动作识别

我们介绍的概念 动态影像 ,一种新颖的紧凑型视频表示形式,可用于视频分析,尤其是与卷积神经网络(CNN)结合使用时。动态图像通过使用“等级池”的概念对诸如RGB或光流视频之类的时间数据进行编码。这个想法是要学习一种捕获数据的时间演变的排序机,并使用后者的参数来表示。我们称这种结果表示为动态图像,因为它除了外观之外还概述了视频动态。这个强大的想法允许将任何视频转换为图像,从而使经过静态图像预先训练的现有CNN模型可以立即扩展到视频。我们还提出了一种高效的近似秩合并运算符,该运算符的运行速度比标准运算符快两个数量级,而排名性能却没有任何损失,可以将其表示为CNN层。为了展示表示的力量,我们介绍了一部小说 四流 可以从RGB和光流框架以及它们的动态图像表示中学习的CNN体​​系结构。我们表明,所提出的网络在UCF101和HMDB51中分别达到了95.5%和72.5%的准确度。
更新日期:2018-11-05
down
wechat
bug