SG-FCN: A Motion and Memory-Based Deep Learning Model for Video Saliency Detection,IEEE Transactions on Cybernetics

当前位置： X-MOL 学术 › IEEE Trans. Cybern. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SG-FCN: A Motion and Memory-Based Deep Learning Model for Video Saliency Detection
IEEE Transactions on Cybernetics ( IF 9.4 ) Pub Date : 5-25-2018 , DOI: 10.1109/tcyb.2018.2832053
Meijun Sun , Ziqi Zhou , Qinghua Hu , Zheng Wang , Jianmin Jiang

Data-driven saliency detection has attracted strong interest as a result of applying convolutional neural networks to the detection of eye fixations. Although a number of image-based salient object and fixation detection models have been proposed, video fixation detection still requires more exploration. Different from image analysis, motion and temporal information is a crucial factor affecting human attention when viewing video sequences. Although existing models based on local contrast and low-level features have been extensively researched, they failed to simultaneously consider interframe motion and temporal information across neighboring video frames, leading to unsatisfactory performance when handling complex scenes. To this end, we propose a novel and efficient video eye fixation detection model to improve the saliency detection performance. By simulating the memory mechanism and visual attention mechanism of human beings when watching a video, we propose a step-gained fully convolutional network by combining the memory information on the time axis with the motion information on the space axis while storing the saliency information of the current frame. The model is obtained through hierarchical training, which ensures the accuracy of the detection. Extensive experiments in comparison with 11 state-of-the-art methods are carried out, and the results show that our proposed model outperforms all 11 methods across a number of publicly available datasets.

中文翻译：

SG-FCN：用于视频显着性检测的基于运动和记忆的深度学习模型

由于将卷积神经网络应用于眼睛注视检测，数据驱动的显着性检测引起了人们的浓厚兴趣。尽管已经提出了许多基于图像的显着目标和注视检测模型，但视频注视检测仍然需要更多的探索。与图像分析不同，运动和时间信息是影响人类观看视频序列时注意力的关键因素。尽管基于局部对比度和低级特征的现有模型已被广泛研究，但它们未能同时考虑相邻视频帧之间的帧间运动和时间信息，导致在处理复杂场景时性能不令人满意。为此，我们提出了一种新颖有效的视频眼睛注视检测模型，以提高显着性检测性能。通过模拟人类观看视频时的记忆机制和视觉注意机制，我们提出了一种逐步增益的全卷积网络，将时间轴上的记忆信息与空间轴上的运动信息相结合，同时存储视频的显着性信息。当前帧。该模型是通过分层训练得到的，保证了检测的准确性。与 11 种最先进的方法进行了广泛的比较，结果表明，我们提出的模型在许多公开可用的数据集中优于所有 11 种方法。

更新日期：2024-08-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11