当前位置: X-MOL 学术IEEE Signal Process. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Global Semantic Descriptors for Zero-Shot Action Recognition
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 8-22-2022 , DOI: 10.1109/lsp.2022.3200605
Valter Estevam 1 , Rayson Laroca 2 , Helio Pedrini 3 , David Menotti 2
Affiliation  

The success of zero-shot action recognition (ZSAR) methods is intrinsically related to the nature of semantic side information used to transfer knowledge, although this aspect has not been primarily investigated in the literature. This work introduces a new ZSAR method based on the relationships of actions-objects and actions-descriptive sentences. We demonstrate that representing all object classes using descriptive sentences generates an accurate object-action affinity estimation when a paraphrase estimation method is used as an embedder. We also show how to estimate probabilities over the set of action classes based only on a set of sentences without hard human labeling. In our method, the probabilities from these two global classifiers (i.e., which use features computed over the entire video) are combined, producing an efficient transfer knowledge model for action classification. Our results are state-of-the-art in the Kinetics-400 dataset and are competitive on UCF-101 under the ZSAR evaluation. Our code is available at https://github.com/valterlej/objsentzsar.

中文翻译:


用于零样本动作识别的全局语义描述符



零样本动作识别(ZSAR)方法的成功本质上与用于传递知识的语义辅助信息的性质有关,尽管这方面尚未在文献中得到初步研究。这项工作引入了一种新的基于动作-对象和动作-描述句子关系的 ZSAR 方法。我们证明,当使用释义估计方法作为嵌入器时,使用描述性句子表示所有对象类别会生成准确的对象动作亲和力估计。我们还展示了如何仅基于一组没有硬性人类标签的句子来估计一组动作类别的概率。在我们的方法中,这两个全局分类器(即,使用在整个视频上计算的特征)的概率被组合,产生用于动作分类的有效转移知识模型。我们的结果在 Kinetics-400 数据集中是最先进的,并且在 ZSAR 评估下在 UCF-101 上具有竞争力。我们的代码可在 https://github.com/valterlej/objsentzsar 获取。
更新日期:2024-08-28
down
wechat
bug