Deeply-learned and spatial–temporal feature engineering for human action understanding,Future Generation Computer Systems

当前位置： X-MOL 学术 › Future Gener. Comput. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deeply-learned and spatial–temporal feature engineering for human action understanding
Future Generation Computer Systems ( IF 7.5 ) Pub Date : 2021-05-15 , DOI: 10.1016/j.future.2021.04.021
Hechuang Wang

Accurately recognizing various human actions is a key technique in many AI applications, such as visual tracking and human–computer interaction. Aiming at solving the difficulty that the local spatial–temporal features at low-layer is limited and the descriptiveness of middle-level features is weak, we propose a novel human action understanding framework by leveraging the spatial–temporal depth features. More specifically, based on the fact that violent action regions encode highly discriminative information during human action recognition, we employ human depth clue of video images to identify the salient regions for each human. The optical flow characteristics in the regions are utilized as the energy function to measure the regional representativeness. The deep learning architecture is proposed to identify the regions of motions by the energy function. In this way, the sample points are distributed in the areas with intense movements. The collected sample points are utilized as the learned features to capture human action. Based on the learned deep feature, an SVM classifier is used to identify different human action. Comprehensive experimental results have shown that the average recognition accuracy of our human action recognition algorithm reaches 92%, and also exhibits a high robustness to complicated backgrounds.

中文翻译：

深度学习和时空特征工程，用于人类动作理解

准确识别各种人类动作是许多AI应用程序中的一项关键技术，例如视觉跟踪和人机交互。为了解决低层局部时空特征有限，中层特征描述性较弱的难题，我们提出了一种利用时空深度特征的新型人类行为理解框架。更具体地说，基于暴力行为区域在人类行为识别过程中会编码高度区分性信息这一事实，我们采用了人类深度视频影像线索来识别每个人的显着区域。区域中的光流特性被用作能量函数，以测量区域代表性。提出了深度学习架构，以通过能量函数识别运动区域。这样，采样点被分布在运动剧烈的区域中。收集的采样点被用作学习的功能来捕获人类行为。基于学习到的深层功能，SVM分类器用于识别不同的人类动作。综合实验结果表明，我们的人体动作识别算法的平均识别精度达到92％，并且对复杂背景也具有很高的鲁棒性。

更新日期：2021-05-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>