当前位置: X-MOL 学术IEEE Trans. Multimedia › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimizing Fixation Prediction Using Recurrent Neural Networks for 360° Video Streaming in Head-Mounted Virtual Reality
IEEE Transactions on Multimedia ( IF 7.3 ) Pub Date : 2020-03-01 , DOI: 10.1109/tmm.2019.2931807
Ching-Ling Fan , Shou-Cheng Yen , Chun-Ying Huang , Cheng-Hsin Hsu

We study the problem of predicting the viewing probability of different parts of $3\text{60}^{\circ }$ videos when streaming them to head-mounted displays. We propose a fixation prediction network based on recurrent neural network, which leverages sensor and content features. The content features are derived by computer vision (CV) algorithms, which may suffer from inferior performance due to various types of distortion caused by diverse $\text{360}^{\circ }$ video projection models. We propose a unified approach with overlapping virtual viewports to eliminate such negative effects, and we evaluate our proposed solution using several CV algorithms, such as saliency detection, face detection, and object detection. We find that overlapping virtual viewports increase the performance of these existing CV algorithms that were not trained for $\text{360}^{\circ }$ videos. We next fine-tune our fixation prediction network with diverse design options, including: 1) with or without overlapping virtual viewports, 2) with or without future content features, and 3) different feature sampling rates. We empirically choose the best fixation prediction network and use it in a $\text{360}^{\circ }$ video streaming system. We conduct extensive trace-driven simulations with a large-scale dataset to quantify the performance of the $\text{360}^{\circ }$ video streaming system with different fixation prediction algorithms. The results show that our proposed fixation prediction network outperforms other algorithms in several aspects, such as: 1) achieving comparable video quality (average gaps between −0.05 and 0.92 dB), 2) consuming much less bandwidth (average bandwidth reduction by up to 8 Mb/s), 3) reducing the rebuffering time (on average 40 s in bandwidth-limited 4G cellular networks), and 4) running in real-time (at most 124 ms).

中文翻译:

使用循环神经网络优化头戴式虚拟现实中 360° 视频流的注视预测

我们研究了预测不同部分的观看概率的问题 $3\text{60}^{\circ }$将视频流式传输到头戴式显示器时。我们提出了一种基于循环神经网络的注视预测网络,它利用了传感器和内容特征。内容特征是由计算机视觉 (CV) 算法得出的,由于各种类型的失真导致的各种类型的失真可能会导致性能较差$\text{360}^{\circ }$视频投影模型。我们提出了一种具有重叠虚拟视口的统一方法来消除这种负面影响,并且我们使用多种 CV 算法评估我们提出的解决方案,例如显着性检测、面部检测和对象检测。我们发现重叠的虚拟视口提高了这些未经训练的现有 CV 算法的性能$\text{360}^{\circ }$视频。接下来,我们使用不同的设计选项微调我们的注视预测网络,包括:1) 有或没有重叠的虚拟视口,2) 有或没有未来的内容特征,以及 3) 不同的特征采样率。我们凭经验选择最佳注视预测网络并将其用于$\text{360}^{\circ }$视频流系统。我们使用大规模数据集进行广泛的跟踪驱动模拟,以量化$\text{360}^{\circ }$具有不同注视预测算法的视频流系统。结果表明,我们提出的注视预测网络在几个方面优于其他算法,例如:1)实现可比的视频质量(-0.05 和 0.92 dB 之间的平均差距),2)消耗更少的带宽(平均带宽减少多达 8 Mb/s)、3) 减少重新缓冲时间(在带宽受限的 4G 蜂窝网络中平均为 40 秒),以及 4)实时运行(最多 124 毫秒)。
更新日期:2020-03-01
down
wechat
bug