Word-level Sign Language Recognition with Multi-stream Neural Networks Focusing on Local Regions,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Word-level Sign Language Recognition with Multi-stream Neural Networks Focusing on Local Regions
arXiv - CS - Multimedia Pub Date : 2021-06-30 , DOI: arxiv-2106.15989
Mizuki Maruyama, Shuvozit Ghose, Katsufumi Inoue, Partha Pratim Roy, Masakazu Iwamura, Michifumi Yoshioka

In recent years, Word-level Sign Language Recognition (WSLR) research has gained popularity in the computer vision community, and thus various approaches have been proposed. Among these approaches, the method using I3D network achieves the highest recognition accuracy on large public datasets for WSLR. However, the method with I3D only utilizes appearance information of the upper body of the signers to recognize sign language words. On the other hand, in WSLR, the information of local regions, such as the hand shape and facial expression, and the positional relationship among the body and both hands are important. Thus in this work, we utilized local region images of both hands and face, along with skeletal information to capture local information and the positions of both hands relative to the body, respectively. In other words, we propose a novel multi-stream WSLR framework, in which a stream with local region images and a stream with skeletal information are introduced by extending I3D network to improve the recognition accuracy of WSLR. From the experimental results on WLASL dataset, it is evident that the proposed method has achieved about 15% improvement in the Top-1 accuracy than the existing conventional methods.

中文翻译：

基于局部区域的多流神经网络的词级手语识别

近年来，词级手语识别 (WSLR) 研究在计算机视觉社区中越来越受欢迎，因此提出了各种方法。在这些方法中，使用 I3D 网络的方法在 WSLR 的大型公共数据集上实现了最高的识别精度。然而，I3D的方法仅利用手语者上半身的外观信息来识别手语单词。另一方面，在 WSLR 中，局部区域的信息，例如手形和面部表情，以及身体和双手之间的位置关系是重要的。因此，在这项工作中，我们利用双手和脸部的局部区域图像以及骨骼信息来分别捕获局部信息和双手相对于身体的位置。换句话说，我们提出了一种新的多流 WSLR 框架，其中通过扩展 I3D 网络引入具有局部区域图像的流和具有骨架信息的流，以提高 WSLR 的识别精度。从 WLASL 数据集上的实验结果可以看出，所提出的方法在 Top-1 精度上比现有的传统方法提高了 15% 左右。

更新日期：2021-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>