当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ventral & Dorsal Stream Theory based Zero-Shot Action Recognition
Pattern Recognition ( IF 7.5 ) Pub Date : 2021-03-28 , DOI: 10.1016/j.patcog.2021.107953
Meng Xing , Zhiyong Feng , Yong Su , Weilong Peng , Jianhai Zhang

Most Zero-Shot Action Recognition (ZSAR) methods establish visual-semantic joint embedding space, which is based on commonly used visual features and semantic embeddings, to learn the correlation between actions. Nevertheless, extracting visual features without structural guidance would lead to sparse video features, which reflect the correlation of actions, fall into oblivion. Based on the Ventral & Dorsal Stream Theory (VD), we propose a VD-ZSAR method to extract irredundant visual feature, which can relieve relation ambiguity caused by redundant visual feature. And a visual-semantic joint embedding space is learned by combining nonredundant visual space with semantic space. Specifically, visual space is constructed by the motion cues perceived by Dorsal Stream, and the object cues perceived by Ventral Stream. Semantic space is constructed by sentence-to-vector generator. The visual-semantic joint embedding space is built by a nonlinear similarity metric learning mechanism, which can better implicitly reflect the correlation between actions. Extensive experiments on the Olympic, HDMB51 and UCF101 datasets validate the favorable performance of our proposed approach.



中文翻译:

基于腹部和背流理论的零发动作识别

大多数零拍动作识别(ZSAR)方法都基于常用的视觉特征和语义嵌入来建立视觉语义联合嵌入空间,以了解动作之间的相关性。然而,在没有结构指导的情况下提取视觉特征将导致反映动作相关性的稀疏视频特征被遗忘。基于腹背流理论(VD),提出了一种VD-ZSAR方法提取多余的视觉特征,可以减轻多余的视觉特征引起的关系模糊性。通过将非冗余视觉空间与语义空间相结合,学习了视觉语义联合嵌入空间。具体来说,视觉空间是由背流感知到的运动线索和腹流感知到的物体线索构成的。语义空间是由句子到向量生成器构建的。视觉语义联合嵌入空间是通过非线性相似性度量学习机制构建的,可以更好地隐式反映动作之间的相关性。在Olympic,HDMB51和UCF101数据集上进行的大量实验验证了我们提出的方法的良好性能。

更新日期:2021-04-04
down
wechat
bug