Trear: Transformer-Based RGB-D Egocentric Action Recognition,IEEE Transactions on Cognitive and Developmental Systems

当前位置： X-MOL 学术 › IEEE Trans. Cogn. Dev. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Trear: Transformer-Based RGB-D Egocentric Action Recognition
IEEE Transactions on Cognitive and Developmental Systems ( IF 5.0 ) Pub Date : 2021-01-01 , DOI: 10.1109/tcds.2020.3048883
Xiangyu Li ₁ , Yonghong Hou ₁ , Pichao Wang ₂ , Zhimin Gao ₃ , Mingliang Xu ₃ , Wanqing Li ₄

Affiliation

In this article, we propose a transformer-based RGB-D egocentric action recognition framework, called Trear. It consists of two modules: 1) interframe attention encoder and 2) mutual-attentional fusion block. Instead of using optical flow or recurrent units, we adopt a self-attention mechanism to model the temporal structure of the data from different modalities. Input frames are cropped randomly to mitigate the effect of the data redundancy. Features from each modality are interacted through the proposed fusion block and combined through a simple yet effective fusion operation to produce a joint RGB-D representation. Empirical experiments on two large egocentric RGB-D data sets: 1) THU-READ and 2) first-person hand action, and one small data set, wearable computer vision systems, have shown that the proposed method outperforms the state-of-the-art results by a large margin.

中文翻译：

Trear：基于 Transformer 的 RGB-D 以自我为中心的动作识别

在本文中，我们提出了一个基于 Transformer 的 RGB-D 以自我为中心的动作识别框架，称为 Trear。它由两个模块组成：1）帧间注意编码器和2）相互注意融合块。我们没有使用光流或循环单元，而是采用自注意力机制来模拟来自不同模态的数据的时间结构。输入帧被随机裁剪以减轻数据冗余的影响。来自每种模态的特征通过所提出的融合块进行交互，并通过简单而有效的融合操作组合以产生联合 RGB-D 表示。两个大型以自我为中心的 RGB-D 数据集的实证实验：1）THU-READ 和 2）第一人称手部动作，以及一个小型数据集，可穿戴计算机视觉系统，

更新日期：2021-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11