Gesture Recognition Based on Deep Deformable 3D Convolutional Neural Networks,Pattern Recognition

当前位置： X-MOL 学术 › Pattern Recogn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Gesture Recognition Based on Deep Deformable 3D Convolutional Neural Networks
Pattern Recognition ( IF 8 ) Pub Date : 2020-11-01 , DOI: 10.1016/j.patcog.2020.107416
Yifan Zhang , Lei Shi , Yi Wu , Ke Cheng , Jian Cheng , Hanqing Lu

Abstract Dynamic gesture recognition, which plays an essential role in human-computer interaction, has been widely investigated but not yet fully addressed. The challenge mainly lies in three folders: 1) to model both of the spatial appearance and the temporal evolution simultaneously; 2) to address the interference from the varied and complex background; 3) the requirement of real-time processing. In this paper, we address the above challenges by proposing a novel deep deformable 3D convolutional neural network for end-to-end learning, which not only gains impressive accuracy in challenging datasets but also can meet the requirement of the real-time processing. We propose three types of very deep 3D CNNs for gesture recognition, which can directly model the spatiotemporal information with their inherent hierarchical structure. To eliminate the background interference, a light-weight spatiotemporal deformable convolutional module is specially designed to augment the spatiotemporal sampling locations of the 3D convolution by learning additional offsets according to the preceding feature map. It can not only diversify the shape of the convolution kernel to better fit the appearance of the hands and arms, but also help the models pay more attention to the discriminative frames in the video sequence. The proposed method is evaluated on three challenging datasets, EgoGesture, Jester and Chalearn-IsoGD, and achieves the state-of-the-art performance on all of them. Our model ranked first on Jester’s official leader-board until the submission time. The code and the trained models are released for better communication and future works 1 .

中文翻译：

基于深度可变形3D卷积神经网络的手势识别

摘要动态手势识别在人机交互中起着至关重要的作用，已被广泛研究但尚未得到充分解决。挑战主要在于三个文件夹：1）同时对空间外观和时间演化进行建模；2) 解决多变复杂背景的干扰；3）实时处理的要求。在本文中，我们通过提出一种用于端到端学习的新型深度可变形 3D 卷积神经网络来解决上述挑战，该网络不仅在具有挑战性的数据集上获得了令人印象深刻的准确性，而且还可以满足实时处理的要求。我们提出了三种用于手势识别的非常深的 3D CNN，它们可以直接对时空信息及其固有的层次结构进行建模。为了消除背景干扰，专门设计了一个轻量级的时空可变形卷积模块，通过根据前面的特征图学习额外的偏移量来增加 3D 卷积的时空采样位置。它不仅可以使卷积核的形状多样化以更好地拟合手和手臂的外观，还可以帮助模型更加关注视频序列中的判别帧。所提出的方法在三个具有挑战性的数据集 EgoGesture、Jester 和 Chalearn-IsoGD 上进行了评估，并在所有这些数据集上实现了最先进的性能。在提交时间之前，我们的模型在 Jester 的官方排行榜上排名第一。发布代码和经过训练的模型，以便更好地交流和未来的工作 1。轻量级时空可变形卷积模块专门设计用于通过根据前面的特征图学习额外的偏移量来增加 3D 卷积的时空采样位置。它不仅可以使卷积核的形状多样化以更好地拟合手和手臂的外观，还可以帮助模型更加关注视频序列中的判别帧。所提出的方法在三个具有挑战性的数据集 EgoGesture、Jester 和 Chalearn-IsoGD 上进行了评估，并在所有这些数据集上实现了最先进的性能。在提交时间之前，我们的模型在 Jester 的官方排行榜上排名第一。代码和经过训练的模型已发布，以便更好地交流和未来的工作 1 。轻量级时空可变形卷积模块专门设计用于通过根据前面的特征图学习额外的偏移量来增加 3D 卷积的时空采样位置。它不仅可以使卷积核的形状多样化以更好地拟合手和手臂的外观，还可以帮助模型更加关注视频序列中的判别帧。所提出的方法在三个具有挑战性的数据集 EgoGesture、Jester 和 Chalearn-IsoGD 上进行了评估，并在所有这些数据集上实现了最先进的性能。在提交时间之前，我们的模型在 Jester 的官方排行榜上排名第一。代码和经过训练的模型已发布，以便更好地交流和未来的工作 1 。

更新日期：2020-11-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>