Deep motion templates and extreme learning machine for sign language recognition,The Visual Computer

当前位置： X-MOL 学术 › Vis. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep motion templates and extreme learning machine for sign language recognition
The Visual Computer ( IF 3.0 ) Pub Date : 2019-07-25 , DOI: 10.1007/s00371-019-01725-3
Javed Imran , Balasubramanian Raman

Sign language is a visual language used by persons with hearing and speech impairment to communicate through fingerspellings and body gestures. This paper proposes a framework for automatic sign language recognition without the need of hand segmentation. The proposed method first generates three different types of motion templates: motion history image, dynamic image and our proposed RGB motion image. These motion templates are used to fine-tune three ConvNets trained on ImageNet dataset. Fine-tuning avoids learning all the parameters from scratch, leading to faster network convergence even with a small number of training samples. For combining the output of three ConvNets, we propose a fusion technique based on Kernel-based extreme learning machine (KELM). The features extracted from the last fully connected layer of trained ConvNets are used to train three KELMs, and the final class label is predicted by averaging their scores. The proposed approach is validated on a number of publicly available sign language as well as human action recognition datasets, and state-of-the-art results are achieved. Finally, an Indian sign language dataset is also collected using a thermal camera. The experimental results obtained show that our ConvNet-based deep features along with proposed KELM-based fusion are robust for any type of human motion recognition.

中文翻译：

用于手语识别的深度运动模板和极限学习机

手语是听力和语言障碍人士通过手指拼写和肢体动作进行交流的一种视觉语言。本文提出了一种无需手部分割的自动手语识别框架。所提出的方法首先生成三种不同类型的运动模板：运动历史图像、动态图像和我们提出的 RGB 运动图像。这些运动模板用于微调在 ImageNet 数据集上训练的三个 ConvNet。微调避免了从头开始学习所有参数，即使使用少量训练样本也能加快网络收敛速度。为了结合三个 ConvNets 的输出，我们提出了一种基于内核的极限学习机 (KELM) 的融合技术。从训练的 ConvNet 的最后一个全连接层提取的特征用于训练三个 KELM，并通过平均它们的分数来预测最终的类标签。所提出的方法在许多公开可用的手语和人类动作识别数据集上得到验证，并取得了最先进的结果。最后，还使用热像仪收集了印度手语数据集。获得的实验结果表明，我们基于 ConvNet 的深度特征以及提出的基于 KELM 的融合对于任何类型的人体运动识别都是稳健的。最后，还使用热像仪收集了印度手语数据集。获得的实验结果表明，我们基于 ConvNet 的深度特征以及提出的基于 KELM 的融合对于任何类型的人体运动识别都是稳健的。最后，还使用热像仪收集了印度手语数据集。获得的实验结果表明，我们基于 ConvNet 的深度特征以及提出的基于 KELM 的融合对于任何类型的人体运动识别都是稳健的。

更新日期：2019-07-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文