Dynamic hand gesture recognition based on short-term sampling neural networks,IEEE/CAA Journal of Automatica Sinica

当前位置： X-MOL 学术 › IEEE/CAA J. Automatica Sinica › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Dynamic hand gesture recognition based on short-term sampling neural networks
IEEE/CAA Journal of Automatica Sinica ( IF 15.3 ) Pub Date : 2021-03-05 , DOI: 10.1109/jas.2020.1003465
Wenjin Zhang ₁ , Jiacun Wang ₁ , Fangping Lan ₁

Affiliation

Hand gestures are a natural way for human-robot interaction. Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications. This paper presents a novel deep learning network for hand gesture recognition. The network integrates several well-proved modules together to learn both short-term and long-term features from video inputs and meanwhile avoid intensive computation. To learn short-term features, each video input is segmented into a fixed number of frame groups. A frame is randomly selected from each group and represented as an RGB image as well as an optical flow snapshot. These two entities are fused and fed into a convolutional neural network ( ConvNet ) for feature extraction. The ConvNets for all groups share parameters. To learn long-term features, outputs from all ConvNets are fed into a long short-term memory ( LSTM ) network, by which a final classification result is predicted. The new model has been tested with two popular hand gesture datasets, namely the Jester dataset and Nvidia dataset. Comparing with other models, our model produced very competitive results. The robustness of the new model has also been proved with an augmented dataset with enhanced diversity of hand gestures.

中文翻译：

基于短期采样神经网络的动态手势识别

手势是人机交互的自然方式。基于视觉的动态手势识别因其广泛的应用而成为研究热点。本文提出了一种用于手势识别的新型深度学习网络。该网络将多个经过充分验证的模块集成在一起，从视频输入中学习短期和长期特征，同时避免密集计算。为了学习短期特征，每个视频输入被分割成固定数量的帧组。从每组中随机选择一帧并表示为 RGB 图像和光流快照。这两个实体被融合并输入到卷积神经网络（ConvNet）中进行特征提取。所有组的 ConvNet 共享参数。为了学习长期特征，所有 ConvNet 的输出都被输入到长短期记忆 (LSTM) 网络中，通过该网络预测最终的分类结果。新模型已经使用两个流行的手势数据集进行了测试，即 Jester 数据集和 Nvidia 数据集。与其他模型相比，我们的模型产生了非常有竞争力的结果。新模型的稳健性也通过手势多样性增强的增强数据集得到了证明。

更新日期：2021-03-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文