MV2Flow,ACM Transactions on Multimedia Computing, Communications, and Applications

当前位置： X-MOL 学术 › ACM Trans. Multimed. Comput. Commun. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MV2Flow
ACM Transactions on Multimedia Computing, Communications, and Applications ( IF 5.1 ) Pub Date : 2020-12-31 , DOI: 10.1145/3422360
Hezhen Hu ₁ , Wengang Zhou ₂ , Xingze Li ₁ , Ning Yan ₁ , Houqiang Li ₂

Affiliation

In video action recognition, motion is a very crucial clue, which is usually represented by optical flow. However, optical flow is computationally expensive to obtain, which becomes the bottleneck for the efficiency of traditional action recognition algorithms. In this article, we propose a network called MV2Flow to learn motion representation efficiently from the signals in the compressed domain. To learn the network, three losses are defined. First, we select the classical TV-L1 flow as proxy ground truth to guide the learning. Besides, an unsupervised image reconstruction loss is proposed to further refine it. Moreover, toward the task of action recognition, the above two losses are combined with a motion content loss. To evaluate our approach, extensive experiments on two benchmark datasets UCF-101 and HMDB-51 are conducted. The motion representation generated with our MV2Flow has shown comparable classification performance on action recognition with TV-L1 flow, while operating at an over 200× faster speed. Based on our MV2Flow and 2D-CNN-based network, we have achieved state-of-the-art performance in the compressed domain. With 3D-CNN-based network, we also achieve comparable accuracy with higher inference speed than methods in the decoded domain setting.

中文翻译：

MV2流

在视频动作识别中，动作是一个非常关键的线索，通常用光流来表示。然而，光流计算成本高，成为传统动作识别算法效率的瓶颈。在本文中，我们提出了一种称为 MV2Flow 的网络，可以有效地从压缩域中的信号中学习运动表示。为了学习网络，定义了三个损失。首先，我们选择经典的 TV-L1 流作为代理 ground truth 来指导学习。此外，提出了一种无监督的图像重建损失来进一步细化它。此外，对于动作识别任务，上述两种损失与运动内容损失相结合。为了评估我们的方法，对两个基准数据集 UCF-101 和 HMDB-51 进行了广泛的实验。使用我们的 MV2Flow 生成的运动表示在与 TV-L1 流的动作识别方面表现出相当的分类性能，同时以超过 200 倍的速度运行。基于我们的 MV2Flow 和基于 2D-CNN 的网络，我们在压缩域中实现了最先进的性能。使用基于 3D-CNN 的网络，与解码域设置中的方法相比，我们还实现了相当的精度和更高的推理速度。

更新日期：2020-12-31

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>