Progressive Motion Representation Distillation With Two-Branch Networks for Egocentric Activity Recognition,IEEE Signal Processing Letters

当前位置： X-MOL 学术 › IEEE Signal Process. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Progressive Motion Representation Distillation With Two-Branch Networks for Egocentric Activity Recognition
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 2020-07-22 , DOI: 10.1109/lsp.2020.3011326
Tianshan Liu , Rui Zhao , Jun Xiao , Kin-Man Lam

Video-based egocentric activity recognition involves fine-grained spatio-temporal human-object interactions. State-of-the-art methods, based on the two-branch-based architecture, rely on pre-calculated optical flows to provide motion information. However, this two-stage strategy is computationally intensive, storage demanding, and not task-oriented, which hampers it from being deployed in real-world applications. Albeit there have been numerous attempts to explore other motion representations to replace optical flows, most of the methods were designed for third-person activities, without capturing fine-grained cues. To tackle these issues, in this letter, we propose a progressive motion representation distillation (PMRD) method, based on two-branch networks, for egocentric activity recognition. We exploit a generalized knowledge distillation framework to train a hallucination network, which receives RGB frames as input and produces motion cues guided by the optical-flow network. Specifically, we propose a progressive metric loss, which aims to distill local fine-grained motion patterns in terms of each temporal progress level. To further enforce the proposed distillation framework to concentrate on those informative frames, we integrate a temporal attention mechanism into the metric loss. Moreover, a multi-stage training procedure is employed for the efficient learning of the hallucination network. Experimental results on three egocentric activity benchmarks demonstrate the state-of-the-art performance of the proposed method.

中文翻译：

用于自我中心活动识别的两分支网络的渐进运动表示蒸馏

基于视频的自我中心活动识别涉及细粒度的时空人与物体交互。最先进的方法基于两分支架构，依靠预先计算的光流来提供运动信息。然而，这种两阶段策略计算密集、存储要求高，并且不面向任务，这阻碍了它在实际应用中的部署。尽管人们已经多次尝试探索其他运动表示来取代光流，但大多数方法都是为第三人称活动而设计的，没有捕获细粒度的线索。为了解决这些问题，在这封信中，我们提出了一种基于两分支网络的渐进运动表示蒸馏（PMRD）方法，用于自我中心活动识别。我们利用广义知识蒸馏框架来训练幻觉网络，该网络接收 RGB 帧作为输入并产生由光流网络引导的运动线索。具体来说，我们提出了一种渐进式度量损失，其目的是根据每个时间进度级别提取局部细粒度运动模式。为了进一步强化所提出的蒸馏框架以专注于这些信息框架，我们将时间注意力机制集成到度量损失中。此外，采用多阶段训练程序来有效学习幻觉网络。三个以自我为中心的活动基准的实验结果证明了所提出的方法的最先进的性能。

更新日期：2020-07-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11