A Light Implementation of a 3D Convolutional Network for Online Gesture Recognition,IEEE Latin America Transactions

当前位置： X-MOL 学术 › IEEE Lat. Am. Trans. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Light Implementation of a 3D Convolutional Network for Online Gesture Recognition
IEEE Latin America Transactions ( IF 1.3 ) Pub Date : 2020-02-01 , DOI: 10.1109/tla.2020.9085286
Fabio Brandolt Baldissera ₁ , Fabian Luis Vargas ₁

Affiliation

With the advancement of machine learning techniques and the increased accessibility to computing power, Artificial Neural Networks (ANNs) have achieved state-of-the-art results in image classification and, most recently, in video classification. The possibility of gesture recognition from a video source enables a more natural non-contact human-machine interaction, immersion when interacting in virtual reality environments and can even lead to sign language translation in the near future. However, the techniques utilized in video classification are usually computationally expensive, being prohibitive to conventional hardware. This work aims to study and analyze the applicability of continuous online gesture recognition techniques for embedded systems. This goal is achieved by proposing a new model based on 2D and 3D CNNs able to perform online gesture recognition, i.e. yielding a label while the video frames are still being processed, in a predictive manner, before having access to future frames of the video. This technique is of paramount interest to applications in which the video is being acquired concomitantly to the classification process and the issuing of the labels has a strict deadline. The proposed model was tested against three representative gesture datasets found in the literature. The obtained results suggest the proposed technique improves the state-of-the-art by yielding a quick gesture recognition process while presenting a high accuracy, which is fundamental for the applicability of embedded systems.

中文翻译：

用于在线手势识别的 3D 卷积网络的轻量级实现

随着机器学习技术的进步和计算能力的增强，人工神经网络 (ANN) 在图像分类和最近的视频分类方面取得了最先进的成果。来自视频源的手势识别的可能性实现了更自然的非接触式人机交互，在虚拟现实环境中交互时的沉浸感，甚至可以在不久的将来导致手语翻译。然而，视频分类中使用的技术通常在计算上很昂贵，对传统硬件来说是望而却步。本工作旨在研究和分析连续在线手势识别技术在嵌入式系统中的适用性。这一目标是通过提出一种基于 2D 和 3D CNN 的新模型来实现的，该模型能够执行在线手势识别，即在视频帧仍在处理的同时以预测的方式在访问视频的未来帧之前生成标签。这种技术对于在分类过程中同时获取视频并且标签的发布有严格截止日期的应用程序至关重要。针对文献中发现的三个代表性手势数据集对所提出的模型进行了测试。获得的结果表明，所提出的技术通过产生快速的手势识别过程来改进现有技术，同时呈现出高精度，这是嵌入式系统适用性的基础。在访问视频的未来帧之前，以预测的方式在视频帧仍在处理时生成标签。这种技术对于在分类过程中同时获取视频并且标签的发布有严格截止日期的应用程序至关重要。针对文献中发现的三个代表性手势数据集对所提出的模型进行了测试。获得的结果表明，所提出的技术通过产生快速的手势识别过程来改进现有技术，同时呈现出高精度，这是嵌入式系统适用性的基础。在访问视频的未来帧之前，以预测的方式在视频帧仍在处理时生成标签。这种技术对于在分类过程中同时获取视频并且标签的发布有严格截止日期的应用程序至关重要。所提出的模型针对文献中发现的三个代表性手势数据集进行了测试。获得的结果表明，所提出的技术通过产生快速的手势识别过程来改进现有技术，同时呈现出高精度，这是嵌入式系统适用性的基础。这种技术对于在分类过程中同时获取视频并且标签的发布有严格截止日期的应用程序至关重要。所提出的模型针对文献中发现的三个代表性手势数据集进行了测试。获得的结果表明，所提出的技术通过产生快速的手势识别过程来改进现有技术，同时呈现出高精度，这是嵌入式系统适用性的基础。这种技术对于在分类过程中同时获取视频并且标签的发布有严格截止日期的应用程序至关重要。针对文献中发现的三个代表性手势数据集对所提出的模型进行了测试。获得的结果表明，所提出的技术通过产生快速的手势识别过程来改进现有技术，同时呈现出高精度，这是嵌入式系统适用性的基础。

更新日期：2020-02-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>