Syntactically Guided Generative Embeddings for Zero-Shot Skeleton Action Recognition,arXiv - CS - Graphics

当前位置： X-MOL 学术 › arXiv.cs.GR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Syntactically Guided Generative Embeddings for Zero-Shot Skeleton Action Recognition
arXiv - CS - Graphics Pub Date : 2021-01-27 , DOI: arxiv-2101.11530
Pranay Gupta, Divyanshu Sharma, Ravi Kiran Sarvadevabhatla

We introduce SynSE, a novel syntactically guided generative approach for Zero-Shot Learning (ZSL). Our end-to-end approach learns progressively refined generative embedding spaces constrained within and across the involved modalities (visual, language). The inter-modal constraints are defined between action sequence embedding and embeddings of Parts of Speech (PoS) tagged words in the corresponding action description. We deploy SynSE for the task of skeleton-based action sequence recognition. Our design choices enable SynSE to generalize compositionally, i.e., recognize sequences whose action descriptions contain words not encountered during training. We also extend our approach to the more challenging Generalized Zero-Shot Learning (GZSL) problem via a confidence-based gating mechanism. We are the first to present zero-shot skeleton action recognition results on the large-scale NTU-60 and NTU-120 skeleton action datasets with multiple splits. Our results demonstrate SynSE's state of the art performance in both ZSL and GZSL settings compared to strong baselines on the NTU-60 and NTU-120 datasets.

中文翻译：

零引导骨骼动作识别的语法指导生成嵌入

我们介绍SynSE，这是一种针对零散学习（ZSL）的新颖的语法指导生成方法。我们的端到端方法学习逐步精炼的生成嵌入空间，这些嵌入空间受制于所涉及的模式（视觉，语言）之内和之间。在动作序列嵌入和相应动作描述中的词性（PoS）标记词的嵌入之间定义了模态约束。我们将SynSE部署用于基于骨骼的动作序列识别任务。我们的设计选择使SynSE可以进行综合概括，即识别其动作描述包含训练中未遇到的单词的序列。通过基于置信度的门控机制，我们还将我们的方法扩展到更具挑战性的广义零射击学习（GZSL）问题。我们是第一个在多个分割的大型NTU-60和NTU-120骨架动作数据集上提供零击骨架动作识别结果的系统。我们的结果表明，与NTU-60和NTU-120数据集上的强基准相比，SynSE在ZSL和GZSL设置中均具有最先进的性能。

更新日期：2021-01-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文