Deep Learning Networks-Based Action Videos Classification and Search,International Journal of Pattern Recognition and Artificial Intelligence

当前位置： X-MOL 学术 › Int. J. Pattern Recognit. Artif. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep Learning Networks-Based Action Videos Classification and Search
International Journal of Pattern Recognition and Artificial Intelligence ( IF 0.9 ) Pub Date : 2021-02-01 , DOI: 10.1142/s0218001421520078
Wenshi Wang ₁ , Zhangqin Huang ₁ , Rui Tian ₁

Affiliation

This work presents the deep learning networks-based method using fine-tuning for classification and search of a diversity of action videos. First, a 3D convolutional neural networks (3D CNN) model which performs pre-training operation and fine-tuning strategy is employed to extract the spatiotemporal features of videos. It is first pre-trained on UCF-101 datasets to train model with initial parameters. Then, a small new dataset is employed to fine-tune the initial model for the training of the new model. Once features are extracted by the final CNNs model, distance measure can be adopted to calculate the similarities between the query video and the test dataset for the video search. The searched video is returned and ranked according to the priority when it has higher similarity with the query video. The comparison results in the experiment shows that the search method using fine-tuning obtains better performance than the method without using fine-tuning. Second, the classification results based on the 3D CNN model using fine-tuning are also presented for the consideration of a query by keyword. Accuracy result obtained using the model with the help of fine-tuning is approximately 2.8% higher than that without using fine-tuning.

中文翻译：

基于深度学习网络的动作视频分类和搜索

这项工作提出了基于深度学习网络的方法，该方法使用微调来分类和搜索各种动作视频。首先，使用执行预训练操作和微调策略的 3D 卷积神经网络 (3D CNN) 模型来提取视频的时空特征。它首先在 UCF-101 数据集上进行预训练，以使用初始参数训练模型。然后，使用一个小的新数据集来微调初始模型以训练新模型。一旦最终的 CNNs 模型提取了特征，就可以采用距离度量来计算查询视频和测试数据集之间的相似度以进行视频搜索。搜索到的视频与查询视频相似度较高时，返回并按照优先级排序。实验对比结果表明，使用fine-tuning的搜索方法比不使用fine-tuning的方法获得了更好的性能。其次，还给出了基于 3D CNN 模型使用微调的分类结果，以考虑按关键字进行查询。使用该模型在微调的帮助下获得的准确度结果大约比不使用微调的结果高2.8%。

更新日期：2021-02-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11