当前位置: X-MOL 学术Pattern Recogn. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A novel feature extractor for human action recognition in visual question answering
Pattern Recognition Letters ( IF 3.9 ) Pub Date : 2021-04-20 , DOI: 10.1016/j.patrec.2021.04.002
Francisco H. dos S. Silva , Gabriel M. Bezerra , Gabriel B. Holanda , J. Wellington M. de Souza , Paulo A.L. Rego , Aloísio V. Lira Neto , Victor Hugo C. de Albuquerque , Pedro P. Rebouças Filho

Recognizing and classifying human actions in video clips is a powerful technology for surveillance applications. However, most of the state-of-the-art approaches for this task lack the possibility of being implemented in real-time applications without causing a critical delay. Thus, we propose a fast method to human action recognition for visual question answering, based on a novel feature extractor developed by us, 2D pose estimation, and machine learning techniques. Our extractor obtains features based on distances, angles, and positions of detected anatomical keypoints. We used the UCF101 dataset that corresponds to 13,320 videos with realistic human actions, collected from YouTube, for our work evaluation. The proposed feature extractor, combined with the Complement Naive Bayes classifier, reached a mean Average Precision (mAP) of 62.03% and processed 5.26 frames per second, proving to be faster than most methods while achieving a decent mAP.



中文翻译:

用于视觉问答中人类动作识别的新型特征提取器

识别视频片段中的人为行为并将其分类是用于监视应用程序的强大技术。但是,用于此任务的大多数最新方法都缺乏在实时应用程序中实现而不会造成严重延迟的可能性。因此,基于由我们开发的新颖特征提取器,2D姿态估计和机器学习技术,我们提出了一种用于视觉问题回答的人类动作识别的快速方法。我们的提取器根据检测到的解剖学关键点的距离,角度和位置获得特征。我们使用UCF101数据集进行工作评估,该数据集对应于从YouTube收集的具有现实人为行为的13,320个视频。拟议的特征提取器与Complement Naive Bayes分类器相结合,达到了平均平均精度(一种P) 的 62.03 每秒处理5.26帧,事实证明它比大多数方法都快,同时还可以实现 一种P

更新日期:2021-04-23
down
wechat
bug