当前位置: X-MOL 学术Mach. Vis. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Two-stream lightweight sign language transformer
Machine Vision and Applications ( IF 3.3 ) Pub Date : 2022-08-24 , DOI: 10.1007/s00138-022-01330-w
Yuming Chen , Xue Mei , Xuan Qin

Despite the recent progress of continuous sign language translation-based video, a variety of deep learning models are difficult to apply to the real-time translation in the limit computing resource. We present the two-stream lightweight sign transformer network model for recognizing and translating continuous sign language. This lightweight framework can obtain both static spatial information and all body dynamic features of signer, and the transformer-style decoder architecture to real-time translate sentences from the spatio-temporal context around the signer. Additionally its attention mechanism focus on moving hands and mouth of signer, which is often crucial for semantic understanding of sign language. In this paper, we introduce the Chinese sign language corpus of the business scene which consists of 3080 videos of high quality. The Chinese sign language corpus of the business scene has enormous impetuses for further research on the Chinese sign language translation. Experiments are carried out the PHOENIX-Weather 2014T (Camgoz et al, in: Proceedings of IEEE/CVF conference on computer vision and pattern recognition (CVPR 2018), pp 7784–7793, 2018), Chinese Sign Language dataset Huang et al, in: The thirty-second AAAI conference on artificial intelligence (AAAI-18), pp 2257–2264, 2018) and our CSLBS, the proposed model outperforms the state-of-the-art in inference times and accuracy using only raw RGB and RGB difference frames as input.



中文翻译:

两流轻量级手语转换器

尽管基于连续手语翻译的视频最近取得了进展,但各种深度学习模型难以应用于有限计算资源中的实时翻译。我们提出了用于识别和翻译连续手语的两流轻量级符号转换器网络模型。这个轻量级框架可以获得签名者的静态空间信息和所有身体动态特征,以及转换器式的解码器架构,可以从签名者周围的时空上下文中实时翻译句子。此外,它的注意力机制侧重于手语者的移动手和嘴,这对于手语的语义理解通常至关重要。在本文中,我们介绍了由 3080 个高质量视频组成的商务场景的中文手语语料库。商务场景中的中文手语语料库对进一步研究中文手语翻译具有巨大的推动作用。实验在 PHOENIX-Weather 2014T (Camgoz et al, in: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018), pp 7784–7793, 2018), Chinese Sign Language dataset Huang et al, in :第 32 届 AAAI 人工智能会议 (AAAI-18),第 2257-2264 页,2018 年)和我们的 CSLBS,仅使用原始 RGB 和 RGB,所提出的模型在推理时间和准确性方面优于最先进的模型差分帧作为输入。

更新日期:2022-08-25
down
wechat
bug