TVENet: Temporal Variance Embedding Network for Fine-grained Action Representation,Pattern Recognition

当前位置： X-MOL 学术 › Pattern Recogn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

TVENet: Temporal Variance Embedding Network for Fine-grained Action Representation
Pattern Recognition ( IF 8 ) Pub Date : 2020-07-01 , DOI: 10.1016/j.patcog.2020.107267
Tingting Han , Hongxun Yao , Wenlong Xie , Xiaoshuai Sun , Sicheng Zhao , Jun Yu

Abstract With the breakthroughs in general action understanding, it has become an inevitable trend to analyze the actions in finer granularity. However, related researches have been largely hindered by the lack of fine-grained datasets and the difficulty of capturing subtle differences between fine-grained actions that are highly similar overall. In this paper, we address the above challenges by constructing a fine-grained action dataset, i.e., Figure Skating, which can be used for end-to-end network training and presenting a framework for the joint optimization of classification and similarity constraints. We propose to incorporate the triplet loss into the training of Convolutional Neural Network, which learns a mapping from fine-grained actions to a compact Euclidean space where distances directly correspond to a measure of action similarity. Triplet loss compels actions of distinct classes to have larger distances than actions of the same class. Besides, to boost the discrimination of the fine-grained actions, we further propose a temporal variance embedding network (TVENet) embedding temporal context variances into the feature embeddings during the joint network training. The experimental results on Figure Skating dataset, HMDB51 dataset as well as UCF101 dataset demonstrate the effectiveness of TVENet representation for fine-grained action search.

中文翻译：

TVENet：用于细粒度动作表示的时间方差嵌入网络

摘要随着一般动作理解的突破，对动作进行更细粒度的分析已成为必然趋势。然而，由于缺乏细粒度数据集以及难以捕捉总体高度相似的细粒度动作之间的细微差异，相关研究在很大程度上受到了阻碍。在本文中，我们通过构建细粒度动作数据集（即花样滑冰）来解决上述挑战，该数据集可用于端到端网络训练，并提出了分类和相似性约束联合优化的框架。我们建议将三元组损失纳入卷积神经网络的训练，该网络学习从细粒度动作到紧凑欧几里得空间的映射，其中距离直接对应于动作相似性的度量。Triplet loss 迫使不同类别的动作比同一类别的动作具有更大的距离。此外，为了提高细粒度动作的辨别力，我们进一步提出了一个时间方差嵌入网络（TVENet），在联合网络训练期间将时间上下文方差嵌入到特征嵌入中。在花样滑冰数据集、HMDB51 数据集和 UCF101 数据集上的实验结果证明了 TVENet 表示对细粒度动作搜索的有效性。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>