A Multimodal Multilevel Converged Attention Network for Hand Gesture Recognition With Hybrid sEMG and A-Mode Ultrasound Sensing,IEEE Transactions on Cybernetics

当前位置： X-MOL 学术 › IEEE Trans. Cybern. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Multimodal Multilevel Converged Attention Network for Hand Gesture Recognition With Hybrid sEMG and A-Mode Ultrasound Sensing
IEEE Transactions on Cybernetics ( IF 9.4 ) Pub Date : 9-23-2022 , DOI: 10.1109/tcyb.2022.3204343
Sheng Wei ₁ , Yue Zhang ₁ , Honghai Liu ₂

Affiliation

Gesture recognition based on surface electromyography (sEMG) has been widely used in the field of human_machine interaction (HMI). However, sEMG has limitations, such as low signal-to-noise ratio and insensitivity to fine finger movements, so we consider adding A-mode ultrasound (AUS) to enhance the recognition impact. To explore the influence of multisource sensing data on gesture recognition and better integrate the features of different modules. We proposed a multimodal multilevel converged attention network (MMCANet) model for multisource signals composed of sEMG and AUS. The proposed model extracts the hidden features of the AUS signal with a convolutional neural network (CNN). Meanwhile, a CNN-LSTM (long-short memory network) hybrid structure extracts some spatial-temporal features from the sEMG signal. Then, two types of CNN features from AUS and sEMG are spliced and transmitted to a transformer encoder to fuse the information and interact with sEMG features to produce hybrid features. Finally, the classification results are output employing fully connected layers. Attention mechanisms are used to adjust the weights of feature channels. We compared MMCANet’s feature extraction and classification performance with that of manually extracted sEMG-AUS features using four traditional machine-learning (ML) algorithms. The recognition accuracy increased by at least 5.15%. In addition, we tried deep learning (DL) methods with CNN on single modals. The experimental results showed that the proposed model improved 14.31% and 3.80% over the CNN method with single sEMG and AUS, respectively. Compared with some state-of-the-art fusion techniques, our method also achieved better results.

中文翻译：

用于混合 sEMG 和 A 模式超声传感的手势识别的多模态多级融合注意力网络

基于表面肌电图（sEMG）的手势识别已广泛应用于人机交互（HMI）领域。然而，sEMG 存在信噪比低、对手指精细运动不敏感等局限性，因此我们考虑添加 A 型超声（AUS）来增强识别效果。探讨多源传感数据对手势识别的影响，更好地融合不同模块的特征。我们提出了一种针对由 sEMG 和 AUS 组成的多源信号的多模态多级聚合注意力网络（MMCANet）模型。所提出的模型使用卷积神经网络（CNN）提取 AUS 信号的隐藏特征。同时，CNN-LSTM（长短记忆网络）混合结构从表面肌电信号中提取一些时空特征。然后，来自 AUS 和 sEMG 的两种 CNN 特征被拼接并传输到 Transformer 编码器以融合信息并与 sEMG 特征交互以产生混合特征。最后，使用全连接层输出分类结果。使用注意力机制来调整特征通道的权重。我们将 MMCANet 的特征提取和分类性能与使用四种传统机器学习 (ML) 算法手动提取 sEMG-AUS 特征的性能进行了比较。识别准确率至少提升5.15%。此外，我们在单模态上尝试了使用 CNN 的深度学习 (DL) 方法。实验结果表明，该模型比单sEMG和AUS的CNN方法分别提高了14.31%和3.80%。与一些最先进的融合技术相比，我们的方法也取得了更好的结果。

更新日期：2024-08-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11