Combining CNN streams of dynamic image and depth data for action recognition,Multimedia Systems

当前位置： X-MOL 学术 › Multimedia Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Combining CNN streams of dynamic image and depth data for action recognition
Multimedia Systems ( IF 3.9 ) Pub Date : 2020-01-14 , DOI: 10.1007/s00530-019-00645-5
Roshan Singh , Rajat Khurana , Alok Kumar Singh Kushwaha , Rajeev Srivastava

RGB-D sensors have been in great demand due to its capability of producing large amount of multimodal data like RGB images and depth maps, useful for better training of deep learning models. In this paper, a deep learning model for recognizing human activities in a video sequence by combining multiple CNN streams has been proposed. The proposed work comprises the use of dynamic images generated from RGB images and depth map for three different dimensions. The proposed model is trained using these four streams on VGG Net for action recognition purpose. Further, it is evaluated and compared with the other state-of-the-art methods available in literature, on three challenging datasets, namely MSR daily Activity, UTD MHAD and CAD 60, in terms of accuracy, error, recall, specificity, precision and f-score. From obtained results, it has been observed that the proposed method outperforms other methods.

中文翻译：

结合动态图像和深度数据的 CNN 流进行动作识别

RGB-D 传感器由于能够生成大量多模态数据（如 RGB 图像和深度图），因此需求量很大，这对于更好地训练深度学习模型非常有用。在本文中，提出了一种通过组合多个 CNN 流来识别视频序列中人类活动的深度学习模型。拟议的工作包括使用从 RGB 图像和深度图生成的动态图像，用于三个不同的维度。所提出的模型在 VGG 网络上使用这四个流进行训练，以进行动作识别。此外，在三个具有挑战性的数据集，即 MSR 日常活动、UTD MHAD 和 CAD 60 上，在准确性、错误、召回率、特异性、精确度方面对它进行了评估并与文献中其他最先进的方法进行了比较和 f 分数。从得到的结果来看，

更新日期：2020-01-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>