A Cross-Attention BERT-Based Framework for Continuous Sign Language Recognition,IEEE Signal Processing Letters

当前位置： X-MOL 学术 › IEEE Signal Process. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Cross-Attention BERT-Based Framework for Continuous Sign Language Recognition
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 8-17-2022 , DOI: 10.1109/lsp.2022.3199665
Zhenxing Zhou ₁ , Vincent W.L. Tam ₁ , Edmund Y. Lam ₁

Affiliation

Continuous sign language recognition (CSLR) is a challenging task involving various signal processing techniques to infer the sequences of glosses performed by signers. Existing approaches in CSLR typically use multiple input modalities such as the raw video data and the extracted hand images to improve their recognition accuracy. However, the large modality differences make it difficult to define an integrative framework to effectively exchange and combine the knowledge obtained from different modalities such that they can complement each other for improving the framework's robustness against the gesture variations and background noises in CSLR. To address this issue, we propose a novel cross-attention deep learning framework named the CA-SignBERT. This framework utilizes multiple Bidirectional Encoder Representations from Transformers (BERT) models to analyze the information from different modalities. Among these BERT models, we introduce a special cross-attention mechanism to ensure an efficient inter-modality knowledge exchange. Besides, an innovative weight control module is proposed to dynamically hybridize their outputs. Experimental results reveal that the CA-SignBERT framework attains state-of-the-art performance in four benchmark CSLR datasets.

中文翻译：

基于交叉注意力 BERT 的连续手语识别框架

连续手语识别 (CSLR) 是一项具有挑战性的任务，涉及各种信号处理技术来推断手语者执行的注释顺序。 CSLR 中的现有方法通常使用多种输入模式（例如原始视频数据和提取的手部图像）来提高其识别精度。然而，巨大的模态差异使得很难定义一个集成框架来有效地交换和组合从不同模态获得的知识，从而使它们能够相互补充，以提高框架针对 CSLR 中手势变化和背景噪声的鲁棒性。为了解决这个问题，我们提出了一种新颖的交叉注意力深度学习框架，名为 CA-SignBERT。该框架利用来自 Transformers (BERT) 模型的多个双向编码器表示来分析来自不同模态的信息。在这些 BERT 模型中，我们引入了一种特殊的交叉注意力机制，以确保有效的跨模态知识交换。此外，还提出了一种创新的重量控制模块来动态混合它们的输出。实验结果表明，CA-SignBERT 框架在四个基准 CSLR 数据集中实现了最先进的性能。

更新日期：2024-08-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11