当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Knowledge memorization and generation for action recognition in still images
Pattern Recognition ( IF 8 ) Pub Date : 2021-07-20 , DOI: 10.1016/j.patcog.2021.108188
Jian Dong 1 , Wankou Yang 1 , Yazhou Yao 2 , Fatih Porikli 3
Affiliation  

Human action recognition in visual data is one of the most fundamental challenges in computer vision. Existing approaches for this primary goal have been based on video data, often incorporating both color and dynamic flow information. Nevertheless, the majority of the visual data constitute still images, and for this reason, being able to recognize actions in still image is an ultimate objective of visual understanding with an extended list of applications.

In this paper, we present a novel method that transfers the knowledge learned from action videos onto images to allow recognition of the principal action depicted in still image. Our intuition is that a generative model for knowledge transfer can be learned by taking advantage of the available action videos in the training stage to bridge images to videos. Based on this, we propose two complementary knowledge-transfer models utilizing fully connected networks to deliver the knowledge extracted from color and motion flow sequences to still images. We introduce a weighted reconstruction and classification loss to steer the generation procedure of the networks. In addition, we describe and analyze the influence of different data augmentation techniques, initialization strategies, and weighting coefficients for improving the performance. We observe that: both the transferred knowledge from color sequences and motion flow sequences can improve the performance of still image based human action recognition; the latter one which provides complementary dynamic information improves the performance a lot. We evaluate our models on two publicly available video based human action recognition datasets: UCF101 and HMDB51. To further validate the generalization ability of the proposed solution, we test the learned models from UCF101 dataset on two still image based human action recognition benchmarks: Willow7 Actions and the Sports. Our results demonstrate that the proposed method outperforms the baseline approaches with more than 2% accuracy, 3% accuracy, 3% accuracy and 5% mAP on UCF101, HMDB51, Sports and Willow 7 Actions datasets, respectively.



中文翻译:

静止图像中动作识别的知识记忆和生成

视觉数据中的人类行为识别是计算机视觉中最基本的挑战之一。实现这一主要目标的现有方法基于视频数据,通常结合颜色和动态流信息。然而,大部分视觉数据构成静止图像,因此,能够识别静止图像中的动作是视觉理解的最终目标,具有扩展的应用程序列表。

在本文中,我们提出了一种新方法,可以将从动作视频中学到的知识转移到图像上,以识别静止图像中描绘的主要动作。我们的直觉是,可以通过利用训练阶段可用的动作视频将图像连接到视频来学习知识转移的生成模型。基于此,我们提出了两种互补的知识转移模型,利用全连接网络将从颜色和运动流序列中提取的知识传递给静止图像。我们引入了加权重建和分类损失来引导网络的生成过程。此外,我们描述和分析了不同数据增强技术、初始化策略和加权系数对提高性能的影响。我们观察到:从颜色序列和运动流序列转移的知识都可以提高基于静止图像的人体动作识别的性能;后一个提供补充动态信息的方法大大提高了性能。我们在两个公开可用的基于视频的人类动作识别数据集上评估我们的模型:UCF101 和 HMDB51。为了进一步验证所提出的解决方案的泛化能力,我们在两个基于静态图像的人类动作识别基准上测试了来自 UCF101 数据集的学习模型:Willow7 Actions 和 Sports。我们的结果表明,所提出的方法在 UCF101、HMDB51、Sports 和 Willow 7 Actions 数据集上分别以超过 2%、3%、3% 和 5% mAP 的准确度优于基线方法。从颜色序列和运动流序列中转移的知识都可以提高基于静止图像的人体动作识别的性能;后一个提供补充动态信息的方法大大提高了性能。我们在两个公开可用的基于视频的人类动作识别数据集上评估我们的模型:UCF101 和 HMDB51。为了进一步验证所提出的解决方案的泛化能力,我们在两个基于静态图像的人类动作识别基准上测试了来自 UCF101 数据集的学习模型:Willow7 Actions 和 Sports。我们的结果表明,所提出的方法在 UCF101、HMDB51、Sports 和 Willow 7 Actions 数据集上分别以超过 2%、3%、3% 和 5% mAP 的准确度优于基线方法。从颜色序列和运动流序列中转移的知识都可以提高基于静止图像的人体动作识别的性能;后一个提供补充动态信息的方法大大提高了性能。我们在两个公开可用的基于视频的人类动作识别数据集上评估我们的模型:UCF101 和 HMDB51。为了进一步验证所提出的解决方案的泛化能力,我们在两个基于静态图像的人类动作识别基准上测试了来自 UCF101 数据集的学习模型:Willow7 Actions 和 Sports。我们的结果表明,所提出的方法在 UCF101、HMDB51、Sports 和 Willow 7 Actions 数据集上分别以超过 2%、3%、3% 和 5% mAP 的准确度优于基线方法。

更新日期:2021-07-25
down
wechat
bug