Deep metric learning for open-set human action recognition in videos,Neural Computing and Applications

当前位置： X-MOL 学术 › Neural Comput. & Applic. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep metric learning for open-set human action recognition in videos
Neural Computing and Applications ( IF 4.5 ) Pub Date : 2020-06-03 , DOI: 10.1007/s00521-020-05009-z
Matheus Gutoski , André Eugênio Lazzaretti , Heitor Silvério Lopes

Human action recognition (HAR) is a topic widely studied in computer vision and pattern recognition. Despite the success of recent models for this issue, most of them approach HAR from the closed-set perspective. The closed-set recognition works under the assumption that all classes are known a priori and they appear during the training and test phase. Unlike most previous works, we approach HAR from the open-set perspective, that is, previously unknown classes are considered in the model. Additionally, feature extraction for HAR in the context of open set is still underexplored in the recent literature, since one needs to represent known classes with a low intra-class variance to reject unknown examples. To achieve this task, we propose a deep metric learning model named triplet inflated 3D convolutional neural network (TI3D), which builds upon the well-known I3D model. TI3D is a representation learning model that takes as input video sequences and outputs 256-dimensional representations. We perform extensive experiments and statistical comparisons on the UCF-101 dataset using a 30-fold cross-validation procedure in 25 different scenarios with varying degrees of openness and a varying number of training and test classes. Results reveal that the proposed TI3D achieves better performance than non-metric learning models in terms of \(F_1\) score and Youdens index, indicating a promising approach for open-set video action recognition.

中文翻译：

深度度量学习，用于视频中的开放式人类动作识别

人体动作识别（HAR）是计算机视觉和模式识别中广泛研究的主题。尽管最近的模型在此问题上取得了成功，但大多数模型都是从封闭的角度看待HAR。封闭式识别的前提是所有类都是先验的，并且它们出现在训练和测试阶段。与大多数以前的工作不同，我们从开放集的角度处理HAR，即模型中考虑了以前未知的类。另外，由于需要以低的类内方差来表示已知的类来拒绝未知的例子，因此在开放集的背景下，对于HAR的特征提取仍未得到充分的研究。为了完成这项任务，我们提出了一种深度度量学习模型，称为三重膨胀3D卷积神经网络（TI3D），它以著名的I3D模型为基础。TI3D是一种表示学习模型，它以输入视频序列作为输入并输出256维表示。我们在30种交叉验证过程中对UCF-101数据集进行了广泛的实验和统计比较，在25种不同情况下使用了30倍交叉验证程序，这些情况具有不同的开放度和数量的培训和测试课程。结果表明，与非度量学习模型相比，拟议的TI3D在性能上要优于非度量学习模型。我们在30种交叉验证过程中对UCF-101数据集进行了广泛的实验和统计比较，在25种不同情况下使用了30倍交叉验证程序，这些情况具有不同的开放度和数量的培训和测试课程。结果表明，与非度量学习模型相比，拟议的TI3D具有更好的性能。我们在30种交叉验证过程中对UCF-101数据集进行了广泛的实验和统计比较，在25种不同情况下使用了30倍交叉验证程序，这些情况具有不同的开放度和数量的培训和测试课程。结果表明，与非度量学习模型相比，拟议的TI3D在性能上要优于非度量学习模型。\（F_1 \）得分和Youdens指数，表明一种有前途的开放式视频动作识别方法。

更新日期：2020-06-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文