Dynamic Gesture Recognition by Using CNNs and star RGB: a Temporal Information Condensation,Neurocomputing

当前位置： X-MOL 学术 › Neurocomputing › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Dynamic Gesture Recognition by Using CNNs and star RGB: a Temporal Information Condensation
Neurocomputing ( IF 5.5 ) Pub Date : 2020-08-01 , DOI: 10.1016/j.neucom.2020.03.038
Clebeson Canuto dos Santos , Jorge Leonid Aching Samatelo , Raquel Frizera Vassallo

Abstract Due to technological advances, machines are increasingly present in people’s daily lives. Thus, there has been more and more effort to develop interfaces that provide an intuitive way of interaction, such as dynamic gestures. Currently, the most common trend is to use multimodal data, as depth and skeleton information, to enable dynamic gesture recognition. However, it would be more interesting if only color information was used, since RGB cameras are usually available in almost every public place, and could be used for gesture recognition without the need of installing additional equipment. The main problem with such approach is the difficulty of representing spatio-temporal information using just color. With this in mind, we propose a technique capable of condensing a dynamic gesture, shown in a video, in just one RGB image. We call this technique star RGB. This image is then passed to a classifier formed by two Resnet CNNs, a soft-attention ensemble, and a fully connected layer, which indicates the class of the gesture present in the input video. Experiments were carried out using Montalbano, GRIT, and isoGD datasets. For Montalbano dataset, the proposed approach achieved an accuracy of 94.58%. Such result reaches the state-of-the-art when considering this dataset and only color information. For GRIT dataset, our proposal achieves more than 98% of accuracy, recall, precision, and F1-score, outperforming the authors’ approach by more than 6%. Regarding the large scale isoGD dataset, the proposal achieved 52.18% of accuracy. However, taking into account the complexity of the dataset (eight different gestures categories) and the amount of classes (249), we consider that our approach is competitive with previous ones, since we employed only color information to recognize gestures instead of all the multimodal data available, usually used by other methods.

中文翻译：

使用 CNN 和星型 RGB 进行动态手势识别：时间信息浓缩

摘要由于技术进步，机器越来越多地出现在人们的日常生活中。因此，已经有越来越多的努力来开发提供直观交互方式的界面，例如动态手势。目前，最常见的趋势是使用多模态数据作为深度和骨架信息来实现动态手势识别。但是，如果只使用颜色信息会更有趣，因为 RGB 摄像头通常几乎在每个公共场所都可用，并且无需安装额外设备即可用于手势识别。这种方法的主要问题是难以仅使用颜色来表示时空信息。考虑到这一点，我们提出了一种技术，能够将视频中显示的动态手势压缩在一个 RGB 图像中。我们称这种技术为星形RGB。然后将该图像传递给由两个 Resnet CNN、一个软注意力集成和一个全连接层组成的分类器，该层指示输入视频中存在的手势类别。使用 Montalbano、GRIT 和 isoGD 数据集进行了实验。对于 Montalbano 数据集，所提出的方法达到了 94.58% 的准确率。当只考虑这个数据集和颜色信息时，这样的结果达到了最先进的水平。对于 GRIT 数据集，我们的提议实现了超过 98% 的准确率、召回率、精确率和 F1 分数，比作者的方法高出 6% 以上。对于大规模 isoGD 数据集，该提案达到了 52.18% 的准确率。然而，考虑到数据集的复杂性（八个不同的手势类别）和类的数量（249），

更新日期：2020-08-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11