A transfer learning-based efficient spatiotemporal human action recognition framework for long and overlapping action classes,The Journal of Supercomputing

当前位置： X-MOL 学术 › J. Supercomput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A transfer learning-based efficient spatiotemporal human action recognition framework for long and overlapping action classes
The Journal of Supercomputing ( IF 3.3 ) Pub Date : 2021-07-13 , DOI: 10.1007/s11227-021-03957-4
Muhammad Bilal ₁ , Muazzam Maqsood ₁ , Sadaf Yasmin ₁ , Najam Ul Hasan ₂ , Seungmin Rho ₃

Affiliation

Deep learning base solutions for computer vision made life easier for humans. Video data contain a lot of hidden information and patterns, that can be used for Human Action Recognition (HAR). HAR can apply to many areas, such as behavior analysis, intelligent video surveillance, and robotic vision. Occlusion, viewpoint variation, and illumination are some issues that make the HAR task more difficult. Some action classes have similar actions or some overlapping parts in them. This, among many other problems, is the main reason that contributes the most to misclassification. Traditional hand-engineering and machine learning-based solutions lack the ability to handle overlapping actions. In this paper, we propose a deep learning-based spatiotemporal HAR framework for overlapping human actions in long videos. Transfer learning techniques are used for deep feature extraction. Fine-tuned pre-trained CNN models learn the spatial relationship at the frame level. An optimized Deep Autoencoder was used to squeeze high-dimensional deep features. An RNN with LSTM was used to learn the long-term temporal relationships. An iterative module added at the end to fine-tune the trained model on new videos that learns and adopt changes. Our proposed framework achieved state-of-the-art performance in spatiotemporal HAR for overlapping human actions in long visual data streams for non-stationary surveillance environments.

中文翻译：

一种基于迁移学习的高效时空人类动作识别框架，用于长且重叠的动作类

用于计算机视觉的深度学习基础解决方案使人类的生活更轻松。视频数据包含大量隐藏信息和模式，可用于人体动作识别（HAR）。HAR可以应用于行为分析、智能视频监控、机器人视觉等诸多领域。遮挡、视点变化和照明是使 HAR 任务更加困难的一些问题。一些动作类具有相似的动作或其中一些重叠的部分。在许多其他问题中，这是导致错误分类的主要原因。传统的手工工程和基于机器学习的解决方案缺乏处理重叠动作的能力。在本文中，我们提出了一种基于深度学习的时空 HAR 框架，用于在长视频中重叠人类行为。迁移学习技术用于深度特征提取。微调的预训练 CNN 模型在帧级别学习空间关系。优化的深度自动编码器用于压缩高维深度特征。带有 LSTM 的 RNN 用于学习长期时间关系。最后添加了一个迭代模块，用于对新视频的训练模型进行微调，以学习和采用更改。我们提出的框架在非平稳监视环境的长视觉数据流中重叠人类行为的时空 HAR 中实现了最先进的性能。带有 LSTM 的 RNN 用于学习长期时间关系。最后添加了一个迭代模块，用于对新视频的训练模型进行微调，以学习和采用更改。我们提出的框架在非平稳监视环境的长视觉数据流中重叠人类行为的时空 HAR 中实现了最先进的性能。带有 LSTM 的 RNN 用于学习长期时间关系。最后添加了一个迭代模块，用于对新视频的训练模型进行微调，以学习和采用更改。我们提出的框架在非平稳监视环境的长视觉数据流中重叠人类行为的时空 HAR 中实现了最先进的性能。

更新日期：2021-07-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>