Multi modal human action recognition for video content matching,Multimedia Tools and Applications

当前位置： X-MOL 学术 › Multimed. Tools Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi modal human action recognition for video content matching
Multimedia Tools and Applications ( IF 3.0 ) Pub Date : 2020-05-29 , DOI: 10.1007/s11042-020-08998-0
Jun Guo , Hao Bai , Zhanyong Tang , Pengfei Xu , Daguang Gan , Baoying Liu

Human action recognition (HAR)in videos is a challenging task in computer vision. Conventional methods are prone to explore the spatiotemporal or optical representations for video actions. However, optical representation might be inefficient in some real-life situations, such as object occlusion and dim light. To address this issue, this paper presents a novel approach for human action recognition by jointly exploiting video and Wi-Fi clues. We leverage the fact that Wi-Fi signals carry discriminative information of human actions, which is robust to optical limitations. To validate this innovative thought, we conceive a practical framework for HAR and setup a dataset containing both video clips and Wi-Fi Channel State Information of human actions. The 3D convolutional neural network was used to extract the video features and the statistical algorithms were used to extract radio features. A classical linear support vector machine is employed as the classifier after the video and radio feature fusion. Comprehensive experiments on this dataset achieved desirable results with the maximum improvement in accuracy by 10%. This demonstrates our promising findings: with the aid of Wi-Fi Channel State Information, the performance of the video action recognition methods can be improved significantly, even under the optical limitation.

中文翻译：

用于视频内容匹配的多模式人体动作识别

视频中的人类动作识别（HAR）是计算机视觉中的一项艰巨任务。常规方法易于探索视频动作的时空或光学表示。但是，在某些现实生活中，例如物体遮挡和昏暗的光线，光学表示可能效率不高。为了解决这个问题，本文提出了一种通过共同利用视频和Wi-Fi线索来进行人类动作识别的新颖方法。我们利用Wi-Fi信号携带人类行为的歧视性信息这一事实，这对于光学限制是很可靠的。为了验证这一创新思想，我们构想了一个适用于HAR的实用框架，并设置了一个包含视频剪辑和人类动作的Wi-Fi通道状态信息的数据集。使用3D卷积神经网络提取视频特征，并使用统计算法提取无线电特征。在视频和无线电特征融合之后，经典的线性支持向量机被用作分类器。在该数据集上进行的综合实验获得了理想的结果，其准确性最高提高了10％。这证明了我们令人鼓舞的发现：借助Wi-Fi通道状态信息，即使在光学限制下，视频动作识别方法的性能也可以得到显着改善。

更新日期：2020-05-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11