Saliency Prediction Network for 360° Videos,IEEE Journal of Selected Topics in Signal Processing

当前位置： X-MOL 学术 › IEEE J. Sel. Top. Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Saliency Prediction Network for 360° Videos
IEEE Journal of Selected Topics in Signal Processing ( IF 8.7 ) Pub Date : 2020-01-01 , DOI: 10.1109/jstsp.2019.2955824
Youqiang Zhang , Feng Dai , Yike Ma , Hongliang Li , Qiang Zhao , Yongdong Zhang

Panoramic videos are becoming more and more easily obtained for common users. Although these videos have

$360^\circ$

field of view, they are usually displayed with perspective views, which needs the saliency informations for viewing angle selection. In this paper, we propose a saliency prediction network for

$360^\circ$

videos. Our network takes video frames and optical flows in cube map format as input, thus it does not suffer from image distorations of panoramic frames. The network is composed of feature encoding module and saliency prediction module. The feature encoding module extracts spatial and temporal features. Then these features are processed by a decoder and bidirectional convolutional LSTM for saliency prediction. To more thoroughly mine the motion information, the temporal stream of feature encoding module accepts optical flows before and after current frame. We also incorporate the global feature of video frames, residual attention and Gaussian priors into the network by considering the viewing behavior of

$360^\circ$

videos, which is useful for performance improvement. To evaluate the performance of our method, we compare it with three state-of-the-art saliency prediction algorithms on two publicly available datasets. The experimental result has shown the effectiveness of our method, which gets the best performance.

中文翻译：

360°视频显着性预测网络

普通用户越来越容易获得全景视频。虽然这些视频有

$360^\circ$

视场，它们通常以透视图显示，这需要显着性信息进行视角选择。在本文中，我们提出了一个显着性预测网络

$360^\circ$

视频。我们的网络以立方图格式的视频帧和光流作为输入，因此它不会受到全景帧图像失真的影响。该网络由特征编码模块和显着性预测模块组成。特征编码模块提取空间和时间特征。然后这些特征由解码器和双向卷积 LSTM 处理以进行显着性预测。为了更彻底地挖掘运动信息，特征编码模块的时间流接受当前帧前后的光流。我们还将视频帧的全局特征、剩余注意力和高斯先验纳入网络，考虑到观看行为

$360^\circ$

视频，这对提高性能很有用。为了评估我们方法的性能，我们将其与两个公开可用数据集上的三种最先进的显着性预测算法进行了比较。实验结果表明了我们方法的有效性，该方法获得了最佳性能。

更新日期：2020-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11