Video classification by fusing two-stream image template classification and pretrained network,Journal of Electronic Imaging

当前位置： X-MOL 学术 › J. Electron. Imaging › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Video classification by fusing two-stream image template classification and pretrained network
Journal of Electronic Imaging ( IF 1.0 ) Pub Date : 2020-09-01 , DOI: 10.1117/1.jei.29.5.053011
Saeedeh Zebhi, Seyed M. T. AlModarresi, Vahid Abootalebi

A motion energy image (MEI) is a spatial template that collapses regions of motion into a single image in which more moving pixels are brighter than others. The forward single-step history image (fSHI) is a spatiotemporal template that shows the presence and direction of motion. Each video can be described using these templates. Recently, the popularity of deep learning architectures for human activity recognition encourages us to explore the effectiveness of combining them with these templates. Hence, three new methods are introduced to convert the problem of human activity recognition in video into image templates classification. In method 1, each video is split into N groups of consecutive frames, and the MEI is computed for each group. Transfer learning with the fine-tuning technique is used for classifying these templates. fSHIs or spatiotemporal templates are used for classification in method 2, similar to method 1. Fusing the two streams of these templates is introduced as method 3. Among these methods, method 3 outperforms the others and is called the proposed method. It achieves recognition accuracies of 92.60% and 93.40% for the UCF Sport and UCF-11 action datasets, respectively. Also, it is compared with state-of-the-art approaches, and the results show that the proposed method has the best performance.

中文翻译：

通过融合两流图像模板分类和预训练网络进行视频分类

运动能量图像（MEI）是将运动区域折叠为单个图像的空间模板，在该图像中，移动的像素比其他像素更亮。前向单步历史图像（fSHI）是一个时空模板，用于显示运动的存在和方向。可以使用这些模板描述每个视频。最近，用于人类活动识别的深度学习架构的流行鼓励我们探索将它们与这些模板结合使用的有效性。因此，引入了三种新方法将视频中的人类活动识别问题转换为图像模板分类。在方法1中，每个视频被分为N组连续的帧，并为每个组计算MEI。使用微调技术的转移学习对这些模板进行分类。与方法1类似，在方法2中使用fSHI或时空模板进行分类。方法3引入了将这些模板的两个流融合在一起的方法3。在这些方法中，方法3优于其他方法，被称为建议的方法。它对UCF Sport和UCF-11动作数据集的识别准确率分别为92.60％和93.40％。并且，将其与最新方法进行了比较，结果表明所提出的方法具有最佳性能。

更新日期：2020-09-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11