当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Second-order Temporal Pooling for Action Recognition
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2018-08-19 , DOI: 10.1007/s11263-018-1111-5
Anoop Cherian , Stephen Gould

Deep learning models for video-based action recognition usually generate features for short clips (consisting of a few frames); such clip-level features are aggregated to video-level representations by computing statistics on these features. Typically zero-th (max) or the first-order (average) statistics are used. In this paper, we explore the benefits of using second-order statistics.Specifically, we propose a novel end-to-end learnable feature aggregation scheme, dubbed temporal correlation pooling that generates an action descriptor for a video sequence by capturing the similarities between the temporal evolution of clip-level CNN features computed across the video. Such a descriptor, while being computationally cheap, also naturally encodes the co-activations of multiple CNN features, thereby providing a richer characterization of actions than their first-order counterparts. We also propose higher-order extensions of this scheme by computing correlations after embedding the CNN features in a reproducing kernel Hilbert space. We provide experiments on benchmark datasets such as HMDB-51 and UCF-101, fine-grained datasets such as MPII Cooking activities and JHMDB, as well as the recent Kinetics-600. Our results demonstrate the advantages of higher-order pooling schemes that when combined with hand-crafted features (as is standard practice) achieves state-of-the-art accuracy.

中文翻译:

用于动作识别的二阶时间池

基于视频的动作识别的深度学习模型通常为短片(由几帧组成)生成特征;通过计算这些特征的统计数据,这些剪辑级别的特征被聚合到视频级别的表示中。通常使用零阶(最大值)或一阶(平均)统计量。在本文中,我们探讨了使用二阶统计的好处。 具体来说,我们提出了一种新的端到端可学习特征聚合方案,称为时间相关池化,它通过捕获视频序列之间的相似性来生成视频序列的动作描述符。跨视频计算的剪辑级 CNN 特征的时间演变。这样的描述符虽然在计算上很便宜,但也自然地编码了多个 CNN 特征的共同激活,从而提供比一阶对应物更丰富的动作特征。我们还通过在将 CNN 特征嵌入到复制内核希尔伯特空间后计算相关性来提出该方案的高阶扩展。我们在基准数据集(例如 HMDB-51 和 UCF-101)、细粒度数据集(例如 MPII Cooking 活动和 JHMDB)以及最近的 Kinetics-600 上提供实验。我们的结果证明了高阶池化方案的优势,当与手工制作的特征(作为标准做法)相结合时,可以达到最先进的准确性。细粒度数据集,例如 MPII Cooking 活动和 JHMDB,以及最近的 Kinetics-600。我们的结果证明了高阶池化方案的优势,当与手工制作的特征(作为标准做法)相结合时,可以达到最先进的准确性。细粒度数据集,例如 MPII Cooking 活动和 JHMDB,以及最近的 Kinetics-600。我们的结果证明了高阶池化方案的优势,当与手工制作的特征(作为标准做法)相结合时,可以达到最先进的准确性。
更新日期:2018-08-19
down
wechat
bug