当前位置: X-MOL 学术Signal Image Video Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A weakly supervised CNN model for spatial localization of human activities in unconstraint environment
Signal, Image and Video Processing ( IF 2.0 ) Pub Date : 2020-01-31 , DOI: 10.1007/s11760-019-01633-y
N. Kumar , N. Sukavanam

Human action localization in a given video sequences refers to the spatial and temporal information of the specified action. Similar to its recognition, action localization also plays very important roles in security, disease diagnosis and geographical systems. The necessity of its localization can help in tracking, detection and prediction issues of the concerned event. The main issue is noticed, while it has to process long, untrimmed and highly occluded videos in uncontrolled conditions as it requires expensive as well as laborious tasks of retrieving annotation for every action. Motivated from the recent state of the art in deep learning for image classification, we presented a weakly supervised action localization model based on deep neural network. The proposed model is useful in case of large amount of dealing with large amount of data as developing a big network consumes more computational resources and many times it raises overfitting issues. We utilized the effectiveness of Inception V3 model (GoogLeNet) framework which uses TensorFlow at backend and Batch normalization along with the convolution layers. Batch normalization efficiently removes covariant shifts problem between the network layers. The approach developed in this work is validated on UCF50 and UCF sports action benchmark datasets. The proposed model gives satisfactory results as observed from the two data-samples (UCF50 and UCF sports); it can perform better on long untrimmed video sequences captured from unconstraint environment. The important application of this work can be found in very sensitive tasks, like hidden objects auto-localization and detecting enemy position under camera surveillance.

中文翻译:

一种用于无约束环境中人类活动空间定位的弱监督 CNN 模型

给定视频序列中的人类动作定位是指指定动作的空间和时间信息。与其识别类似,动作定位在安全、疾病诊断和地理系统中也起着非常重要的作用。其定位的必要性有助于跟踪、检测和预测相关事件的问题。注意到了主要问题,同时它必须在不受控制的条件下处理长、未修剪和高度遮挡的视频,因为它需要为每个动作检索注释的昂贵且费力的任务。受最新图像分类深度学习技术的启发,我们提出了一种基于深度神经网络的弱监督动作定位模型。所提出的模型在大量处理大量数据的情况下很有用,因为开发大型网络会消耗更多的计算资源,并且很多时候会引发过拟合问题。我们利用了 Inception V3 模型 (GoogLeNet) 框架的有效性,该框架在后端使用 TensorFlow,将批量归一化与卷积层一起使用。批量归一化有效地消除了网络层之间的协变移位问题。在这项工作中开发的方法在 UCF50 和 UCF 体育动作基准数据集上得到了验证。从两个数据样本(UCF50 和 UCF 运动)中观察到,所提出的模型给出了令人满意的结果;它可以更好地处理从无约束环境中捕获的未修剪的长视频序列。这项工作的重要应用可以在非常敏感的任务中找到,
更新日期:2020-01-31
down
wechat
bug