CMP-based saliency model for stereoscopic omnidirectional images
Introduction
The pursuit of immersive visual experience to simulate the real world has become a hot topic [1]. With the development of 360° imaging systems, stereoscopic omnidirectional visual contents have gradually attracted more and more attentions from academia and industry because of the immersion, binocular perception and interactivity [2]. Compared with traditional image technologies, one of the prominent features of stereoscopic omnidirectional image (SOI) technologies is to provide an interactive wide field of view (FOV) displayed with head mounted display (HMD). The omnidirectional media format, denoted as OMAF, has been standardized by MPEG [3], in which the coordinate system with a unit sphere and three coordinate axes is used. The omnidirectional image (OI)/SOI covers the whole inner sphere, called as spherical format of OI/SOI, and the user can freely select and view the local regions of the OI/SOI as the viewport images with HMD [4], [5]. For OI/SOI, in addition to spherical format, MPEG also specifies two representative formats, namely equi-rectangular projection (ERP) and cube map projection (CMP) for compressing and transmitting omnidirectional media contents [3]. Most of the existing saliency models of OIs directly process on OIs with ERP format [6], [7], [8]. However, ERP will cause serious shape distortion in the SOI, and the SOI with ERP format (here, called as ERP-mapped SOI) is not consistent with the subjective viewing content in the viewport of HMD. In contrast, SOI with CMP format (called as CMP-mapped SOI) consists of six faces with less shape distortion [1], which is similar to the viewport in viewing OI/SOI with HMD. Therefore, this paper will focus on the CMP-based saliency model of SOI.
The visual attention mechanism of human visual system (HVS) tends to selectively focus on more interesting contents and ignore other parts of visual environment, therefore the limited visual processing resources can be allocated effectively [9]. Saliency prediction simulates HVS's visual attention mechanism, so that more attention and more complex process can be applied to saliency regions. It is greatly beneficial to numerous applications of image processing and computer vision including image/video enhancement, compression, segmentation, quality assessment, recognition, and so on [10], [11], [12]. Most of the existing saliency models are mainly designed for 2D images. However, unlike 2D images, OIs have no boundaries due to its FOV in wide range of 360° × 180°. Therefore, it is not appropriate to directly apply the existing 2D image saliency models to SOIs, this implies that the saliency models for SOIs need to be specially designed based on SOIs' particularity.
In order to better match the image contents viewed with HMD, a novel CMP-based saliency model is proposed for SOIs in this paper. Considering that the color information extracted by using tensor decomposition can preserve the inner structure information of color data, we compute the first sub-band of tensor decomposition of CMP-mapped SOI to extract its color features. Meanwhile, we compute depth features, and estimate the foreground image by using the extracted color and depth information, subsequently. In addition, in order to better suppress the background region at the top and bottom faces of CMP-mapped SOI, the corresponding foreground priori method is proposed. The main contributions of this paper are summarized as follows:
1) A novel CMP-based saliency model for SOIs is proposed with considering serious shape distortion of ERP-mapped SOI and particularity of the viewing way of SOIs with HMD. Moreover, to reduce the boundary effect between the faces of CMP-mapped SOI, the horizontal global face and the horizontal local face are moved and their corresponding saliency maps are estimated. To establish the correlations between the faces of CMP-mapped SOI, the horizontal global face saliency map is used to weigh the horizontal local face saliency maps. Thereby, saliency map is more consistent with visual perception of viewing with HMD.
2) Considering the integrity of the inner structure information of the color features in SOI, the first sub-band of tensor decomposition of the CMP-mapped SOI is used to extract the color features, and the calculated color distance and the corresponding depth distance are combined to calculate the saliency map. Thus, the stereoscopic perception characteristics are better considered. In addition, the foreground priori method is adopted to solve the problem that the background of SOI is difficult to be predicted.
The rest of the paper is organized as follows. Section 2 discusses the related work on saliency detection. Section 3 describes the proposed SOI saliency model in detail. Section 4 experimentally analyzes and discusses the proposed method compared with some state-of-the-art methods. Finally, Section 5 summaries the paper.
Section snippets
Projection transformations of omnidirectional image
ERP and CMP are two of main projection transformations in omnidirectional image (OI) processing [1], [3]. For ERP, its horizontal and vertical coordinates correspond to the longitude and latitude of the sphere, respectively. ERP is intuitive and is the most commonly used projection method. However, its shortcomings are also obvious. As shown in Fig. 1(a), there are larger shape distortions near the poles because of the oversampling, which will result in the waste of bitrate. Moreover, due to
The proposed CMP-based saliency model for SOI
Based on the above analyses, a novel CMP-based saliency prediction method for SOI is proposed and the corresponding CMP-based saliency model for SOI is established in this paper, its framework is shown in Fig. 4. Firstly, an SOI is transformed from its ERP format to the CMP format, and its four horizontal faces as well as the top and bottom faces are obtained to form six faces of the CMP format. Then, to reduce the boundary effect of CMP-mapped SOI, the horizontal four faces are moved to obtain
Experimental results and discussions
In this section, using the toolbox in [38], the proposed model is tested on the public ODI database for SOI. At the same time, extensive experiments are also carried out to verify the effectiveness of the proposed model with comparison of other state-of-the-art saliency models.
Conclusion
In this work, a saliency model based on cube map projection (CMP) for stereoscopic omnidirectional image has been proposed. The proposed model consists of two parts: horizontal face and vertical face saliency prediction. Specifically, horizontal face saliency map is obtained by weighting the horizontal global face saliency map to the horizontal local face saliency map; vertical face saliency map is obtained by the foreground priori method which uses the obtained face saliency map as the
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work was supported by the Natural Science Foundation of China under Grant Nos. 61671258, 61871247, and 61931022. It was also sponsored by the K.C. Wong Magna Fund of Ningbo University.
Junjun Zhang was born in 1995. She is currently working towards the M.E. degree at Ningbo University. Her research interests are visual saliency and visual comfort assessment.
References (38)
- et al.
Recent advances in omnidirectional video coding for virtual reality: projection and evaluation
Signal Process.
(2018) - et al.
Utility-oriented resource allocation for 360-degree video transmission over heterogeneous networks
Digit. Signal Process.
(2019) - et al.
A novel superpixel-based saliency detection model for 360-degree images
Signal Process. Image Commun.
(2018) - et al.
A saliency prediction model on 360 degree images using color dictionary based sparse
Signal Process. Image Commun.
(2018) - et al.
Scanpath and saliency prediction on 360 degree images
Signal Process. Image Commun.
(2018) - et al.
Visual acuity inspired saliency detection by using sparse features
Inf. Sci.
(2015) - et al.
A saliency-based search mechanism for overt and covert shifts of visual attention
Vis. Res.
(2000) - et al.
A feature-integration theory of attention
Cogn. Psychol.
(1980) - et al.
A proto-object based saliency model in three-dimensional space
Vis. Res.
(2016) - et al.
Stereoscopic saliency estimation with background priors based deep reconstruction
Neurocomputing
(2018)
GBVS360, BMS360, ProSal: extending existing saliency prediction models from 2D to omnidirectional images
Signal Process. Image Commun.
The prediction of head and eye movement for 360 degree images
Signal Process. Image Commun.
A feature-based approach for saliency estimation of omni-directional images
Signal Process. Image Commun.
Components of bottom-up gaze allocation in natural images
Vis. Res.
Toolbox and dataset for the development of saliency and scanpath models for omnidirectional/360° still images
Signal Process. Image Commun.
Parallax360: stereoscopic 360° scene representation for head-motion parallax
IEEE Trans. Vis. Comput. Graph.
Low-cost 360 stereo photography and video capture
ACM Trans. Graph.
Saliency detection via graph-based manifold ranking
IEEE Conf. Comput. Vis. Pattern Recognit.
Cited by (0)
Junjun Zhang was born in 1995. She is currently working towards the M.E. degree at Ningbo University. Her research interests are visual saliency and visual comfort assessment.
Mei Yu received the B.S. and M.S. degrees from Hangzhou Institute of Electronics Engineering, China, in 1990 and 1993, and the Ph.D. degree from Ajou University, South Korea, in 2000. She is currently a professor with the Faculty of Information Science and Engineering, Ningbo University, China. Her research interests mainly include image video coding and visual perception.
Gangyi Jiang received the M.S. degree from Hangzhou University, China, in 1992 and the Ph.D. degree from Ajou University, South Korea, in 2000. He is currently a professor with the Faculty of Information Science and Engineering, Ningbo University, China. He is a senior member of the IEEE. He has authored over 100 technical articles in refereed journals. His research interests mainly include visual communication and image video processing, 3D video coding, omnidirectional video processing, light field imaging, visual perception and quality assessment.
Yubin Qi received the B.S. degree from Ningbo University, Ningbo, China, in 2017. He is currently pursuing the M.S. degree at Ningbo University. His current research interest lies in perceptual image processing.