当前位置: X-MOL 学术Comput. Vis. Image Underst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Human action recognition in drone videos using a few aerial training examples
Computer Vision and Image Understanding ( IF 4.3 ) Pub Date : 2021-02-26 , DOI: 10.1016/j.cviu.2021.103186
Waqas Sultani , Mubarak Shah

Drones are enabling new forms of human actions surveillance due to their low cost and fast mobility. However, using deep neural networks for automatic aerial action recognition is difficult due to the need for a large number of training aerial human action videos. Collecting a large number of human action aerial videos is costly, time-consuming, and difficult. In this paper, we explore two alternative data sources to improve aerial action classification when only a few training aerial examples are available. As a first data source, we resort to video games. We collect plenty of aerial game action videos using two gaming engines. For the second data source, we leverage conditional Wasserstein Generative Adversarial Networks to generate aerial features from ground videos. Given that both data sources have some limitations, e.g. game videos are biased towards specific actions categories (fighting, shooting, etc.,), and it is not easy to generate good discriminative GAN-generated features for all types of actions, we need to efficiently integrate two dataset sources with few available real aerial training videos. To address this challenge of the heterogeneous nature of the data, we propose to use a disjoint multitask learning framework. We feed the network with real and game, or real and GAN-generated data in an alternating fashion to obtain an improved action classifier. We validate the proposed approach on two aerial action datasets and demonstrate that features from aerial game videos and those generated from GAN can be extremely useful for an improved action recognition in real aerial videos when only a few real aerial training examples are available.



中文翻译:

无人机视频中的人体动作识别,使用一些空中训练实例

无人机由于其低成本和快速移动性,使得人们可以进行新型的人类行为监视。然而,由于需要大量训练空中人类动作视频,因此难以将深度神经网络用于自动空中动作识别。收集大量的人体动作视频非常昂贵,费时且困难。在本文中,当只有少数训练空中实例可用时,我们探索了两种替代数据源来改进空中动作分类。作为第一个数据源,我们诉诸于视频游戏。我们使用两个游戏引擎收集了大量的空中游戏动作视频。对于第二个数据源,我们利用条件Wasserstein生成对抗网络从地面视频生成空中特征。鉴于两个数据源都有一些局限性,例如 游戏视频偏向于特定的动作类别(战斗,射击等),并且很难为所有类型的动作生成良好的,由GAN生成的具有区分性的特征,我们需要有效地整合两个数据集源,而几乎没有可用的真实航拍培训视频。为了应对数据异构性质的挑战,我们建议使用不相交的多任务学习框架。我们以交替的方式向网络提供真实的和游戏的,或真实的和GAN生成的数据,以获得改进的动作分类器。我们在两个空中动作数据集上验证了所提出的方法,并证明了只有少数实际空中训练示例可用时,空中游戏视频和GAN生成的功能对于改善真实空中视频中的动作识别非常有用。

更新日期:2021-03-10
down
wechat
bug