Adversarial Action Prediction Networks,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Adversarial Action Prediction Networks
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 11-22-2018 , DOI: 10.1109/tpami.2018.2882805
Yu Kong , Zhiqiang Tao , Yun Fu

Different from after-the-fact action recognition, action prediction task requires action labels to be predicted from partially observed videos containing incomplete action executions. It is challenging because these partial videos have insufficient discriminative information, and their temporal structure is damaged. We study this problem in this paper, and propose an efficient and powerful deep network for learning representative and discriminative features for action prediction. Our approach exploits abundant sequential context information in full videos to enrich the feature representations of partial videos. This information is encoded in latent representations using a variational autoencoder (VAE), which are encouraged to be progress-invariant. Decoding such latent representations using another VAE, we can reconstruct missing information in the features extracted from partial videos. An adversarial learning scheme is adopted to differentiate the reconstructed features from the features directly extracted from full videos in order to well align their distributions. A multi-class classifier is also used to encourage the features to be discriminative. Our network jointly learns features and classifiers, and generates the features particularly optimized for action prediction. Extensive experimental results on UCF101, Sports-1M and BIT datasets demonstrate that our approach remarkably outperforms state-of-the-art methods, and shows significant speedup over these methods. Results also show that actions differ in their prediction characteristics; some actions can be correctly predicted even though only the beginning 10%10\% portion of videos is observed.

中文翻译：

对抗性行动预测网络

与事后动作识别不同，动作预测任务需要从包含不完整动作执行的部分观察视频中预测动作标签。这是具有挑战性的，因为这些部分视频没有足够的判别信息，并且它们的时间结构被破坏。我们在本文中研究了这个问题，并提出了一种高效且强大的深度网络，用于学习动作预测的代表性和判别性特征。我们的方法利用完整视频中丰富的顺序上下文信息来丰富部分视频的特征表示。该信息使用变分自动编码器（VAE）编码为潜在表示，鼓励其保持进度不变。使用另一个 VAE 解码此类潜在表示，我们可以重建从部分视频中提取的特征中丢失的信息。采用对抗性学习方案来区分重建特征和直接从完整视频中提取的特征，以便很好地对齐它们的分布。还使用多类分类器来鼓励特征具有区分性。我们的网络联合学习特征和分类器，并生成特别针对动作预测优化的特征。 UCF101、Sports-1M 和 BIT 数据集上的大量实验结果表明，我们的方法明显优于最先进的方法，并且比这些方法有显着的加速。结果还表明，行动的预测特征有所不同；即使只观察到视频的开头 10%10\% 部分，也可以正确预测某些动作。

更新日期：2024-08-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11