ADCI-Net: an adaptive discriminative clip identification strategy for fast video action recognition,Journal of Electronic Imaging

当前位置： X-MOL 学术 › J. Electron. Imaging › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

ADCI-Net: an adaptive discriminative clip identification strategy for fast video action recognition
Journal of Electronic Imaging ( IF 1.0 ) Pub Date : 2021-04-01 , DOI: 10.1117/1.jei.30.2.023030
Zizhao Guo ₁ , Xiujuan Zheng ₂ , Xinyi Chun ₂ , Sancong Ying ₁

Affiliation

The most common method for video-level classification is to operate models over a whole series of fixed-temporal-length clips in a long video. However, a video usually consists of several meaningless sections, and similar clips that are fed into models could lead to a suboptimal recognition accuracy and waste of computing resources. To address this issue, we introduced an adaptive discriminative clip identification network to evaluate every video clip with respect to its relevance. We adaptively choose top-ranked clips as inputs for prediction and filter out irrelevant clips. Specifically, for a given trained cumbersome convolutional neural network action recognition model, we use a lightweight hallucination network (H-Net) to study its generalization ability based on distillation. Then, the evaluation of relevant clips is performed by considering the imitated features of H-Net. Thus, heavy calculations of overlapping content or meaningless clips can be avoided during inferences. We validate our approach by examining its performance on two datasets: UCF101 and HMDB51. Our strategy can be applied to any clip-based action recognition model. Experimental results demonstrate that on UCF101, we reduced the computational cost by 70% and increased the accuracy.

中文翻译：

ADCI-Net：用于快速视频动作识别的自适应判别剪辑识别策略

视频级分类的最常见方法是在长视频中的整个固定时间长度剪辑系列上操作模型。但是，视频通常由几个无意义的部分组成，输入到模型中的相似剪辑可能会导致识别精度欠佳并浪费计算资源。为了解决这个问题，我们引入了自适应判别剪辑识别网络，以评估每个视频剪辑的相关性。我们自适应地选择排名最高的剪辑作为预测的输入，并过滤掉不相关的剪辑。具体来说，对于给定的训练繁琐的卷积神经网络动作识别模型，我们使用轻型幻觉网络（H-Net）来研究其基于蒸馏的泛化能力。然后，有关片段的评估是通过考虑H-Net的模仿功能来进行的。因此，在推理期间可以避免对重叠内容或无意义的剪辑进行繁重的计算。我们通过检查两个数据集（UCF101和HMDB51）的性能来验证我们的方法。我们的策略可以应用于任何基于片段的动作识别模型。实验结果表明，在UCF101上，我们将计算成本降低了70％，并提高了准确性。

更新日期：2021-04-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11