当前位置: X-MOL 学术IEEE Trans. Image Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Spatial-Temporal Recurrent Neural Network for Video Saliency Prediction
IEEE Transactions on Image Processing ( IF 10.6 ) Pub Date : 2020-11-18 , DOI: 10.1109/tip.2020.3036749
Kao Zhang , Zhenzhong Chen , Shan Liu

In this paper, a recurrent neural network is designed for video saliency prediction considering spatial-temporal features. In our work, video frames are routed through the static network for spatial features and the dynamic network for temporal features. For the spatial-temporal feature integration, a novel select and re-weight fusion model is proposed which can learn and adjust the fusion weights based on the spatial and temporal features in different scenes automatically. Finally, an attention-aware convolutional long short term memory (ConvLSTM) network is developed to predict salient regions based on the features extracted from consecutive frames and generate the ultimate saliency map for each video frame. The proposed method is compared with state-of-the-art saliency models on five public video saliency benchmark datasets. The experimental results demonstrate that our model can achieve advanced performance on video saliency prediction.

中文翻译:

时空递归神经网络用于视频显着性预测

在本文中,考虑到时空特征,设计了递归神经网络用于视频显着性预测。在我们的工作中,视频帧通过静态网络(用于空间特征)和动态网络(用于时间特征)进行路由。对于时空特征融合,提出了一种新颖的选择权重融合模型,该模型可以根据不同场景中的时空特征自动学习并调整融合权重。最后,开发了一种注意力感知卷积长期短期记忆(ConvLSTM)网络,以基于从连续帧中提取的特征来预测显着区域,并为每个视频帧生成最终显着图。该方法与五个公共视频显着性基准数据集上的最新显着性模型进行了比较。
更新日期:2020-11-27
down
wechat
bug