CMP-based saliency model for stereoscopic omnidirectional images

https://doi.org/10.1016/j.dsp.2020.102708Get rights and content

Highlights

  • A novel cube map projection (CMP) based saliency model for stereoscopic omnidirectional images (SOI) is presented.

  • The horizontal global face and horizontal local face are moved to reduce boundary effect between CMP faces.

  • The horizontal global face saliency map is used to weigh the horizontal local face saliency maps.

  • The first sub-band in the tensor domain of SOI is used to extract color feature.

  • Foreground priori method is designed to suppress background regions near the poles of SOI.

Abstract

Although many saliency models for images had been proposed, the saliency model for stereoscopic omnidirectional image (SOI) is still a very important but not yet deeply studied topic. In addition, one of main disadvantages of the existing saliency models is difficult to simulate the characteristics of a wide field of view (FOV) of SOIs. To solve the problems, this paper proposes a novel cube map projection (CMP) based saliency model for SOIs. Considering that the equi-rectangular projection (ERP) representation of SOI will result in obvious stretch distortion, and what the human eyes view are the viewport images by using head mounted display (HMD), we construct the saliency model of the SOI in its less shape distorted CMP plane. First, the SOI with ERP format is converted into the CMP format. In order to reduce the boundary effect between the faces in CMP plane, its four horizontal faces are moved to obtain the horizontal global face and the horizontal local faces for which the corresponding saliency maps are subsequently computed by combining color distance and depth distance in the graph model. Then, to establish the correlations of these faces, the horizontal face saliency map is calculated by weighting the horizontal global face saliency map to the horizontal local face saliency map. Meanwhile, the vertical face saliency map is defined and obtained. Finally, the horizontal face saliency map and vertical face saliency map are spliced and re-projected to ERP plane to obtain the final saliency map of the ERP-mapped SOI. The experiments on the public ODI database are performed to compare the proposed methods with the state-of-the-art methods, and the results show that the proposed method achieves better performance of estimating SOI saliency map in terms of six well-known quantitative metrics.

Introduction

The pursuit of immersive visual experience to simulate the real world has become a hot topic [1]. With the development of 360° imaging systems, stereoscopic omnidirectional visual contents have gradually attracted more and more attentions from academia and industry because of the immersion, binocular perception and interactivity [2]. Compared with traditional image technologies, one of the prominent features of stereoscopic omnidirectional image (SOI) technologies is to provide an interactive wide field of view (FOV) displayed with head mounted display (HMD). The omnidirectional media format, denoted as OMAF, has been standardized by MPEG [3], in which the coordinate system with a unit sphere and three coordinate axes is used. The omnidirectional image (OI)/SOI covers the whole inner sphere, called as spherical format of OI/SOI, and the user can freely select and view the local regions of the OI/SOI as the viewport images with HMD [4], [5]. For OI/SOI, in addition to spherical format, MPEG also specifies two representative formats, namely equi-rectangular projection (ERP) and cube map projection (CMP) for compressing and transmitting omnidirectional media contents [3]. Most of the existing saliency models of OIs directly process on OIs with ERP format [6], [7], [8]. However, ERP will cause serious shape distortion in the SOI, and the SOI with ERP format (here, called as ERP-mapped SOI) is not consistent with the subjective viewing content in the viewport of HMD. In contrast, SOI with CMP format (called as CMP-mapped SOI) consists of six faces with less shape distortion [1], which is similar to the viewport in viewing OI/SOI with HMD. Therefore, this paper will focus on the CMP-based saliency model of SOI.

The visual attention mechanism of human visual system (HVS) tends to selectively focus on more interesting contents and ignore other parts of visual environment, therefore the limited visual processing resources can be allocated effectively [9]. Saliency prediction simulates HVS's visual attention mechanism, so that more attention and more complex process can be applied to saliency regions. It is greatly beneficial to numerous applications of image processing and computer vision including image/video enhancement, compression, segmentation, quality assessment, recognition, and so on [10], [11], [12]. Most of the existing saliency models are mainly designed for 2D images. However, unlike 2D images, OIs have no boundaries due to its FOV in wide range of 360° × 180°. Therefore, it is not appropriate to directly apply the existing 2D image saliency models to SOIs, this implies that the saliency models for SOIs need to be specially designed based on SOIs' particularity.

In order to better match the image contents viewed with HMD, a novel CMP-based saliency model is proposed for SOIs in this paper. Considering that the color information extracted by using tensor decomposition can preserve the inner structure information of color data, we compute the first sub-band of tensor decomposition of CMP-mapped SOI to extract its color features. Meanwhile, we compute depth features, and estimate the foreground image by using the extracted color and depth information, subsequently. In addition, in order to better suppress the background region at the top and bottom faces of CMP-mapped SOI, the corresponding foreground priori method is proposed. The main contributions of this paper are summarized as follows:

1) A novel CMP-based saliency model for SOIs is proposed with considering serious shape distortion of ERP-mapped SOI and particularity of the viewing way of SOIs with HMD. Moreover, to reduce the boundary effect between the faces of CMP-mapped SOI, the horizontal global face and the horizontal local face are moved and their corresponding saliency maps are estimated. To establish the correlations between the faces of CMP-mapped SOI, the horizontal global face saliency map is used to weigh the horizontal local face saliency maps. Thereby, saliency map is more consistent with visual perception of viewing with HMD.

2) Considering the integrity of the inner structure information of the color features in SOI, the first sub-band of tensor decomposition of the CMP-mapped SOI is used to extract the color features, and the calculated color distance and the corresponding depth distance are combined to calculate the saliency map. Thus, the stereoscopic perception characteristics are better considered. In addition, the foreground priori method is adopted to solve the problem that the background of SOI is difficult to be predicted.

The rest of the paper is organized as follows. Section 2 discusses the related work on saliency detection. Section 3 describes the proposed SOI saliency model in detail. Section 4 experimentally analyzes and discusses the proposed method compared with some state-of-the-art methods. Finally, Section 5 summaries the paper.

Section snippets

Projection transformations of omnidirectional image

ERP and CMP are two of main projection transformations in omnidirectional image (OI) processing [1], [3]. For ERP, its horizontal and vertical coordinates correspond to the longitude and latitude of the sphere, respectively. ERP is intuitive and is the most commonly used projection method. However, its shortcomings are also obvious. As shown in Fig. 1(a), there are larger shape distortions near the poles because of the oversampling, which will result in the waste of bitrate. Moreover, due to

The proposed CMP-based saliency model for SOI

Based on the above analyses, a novel CMP-based saliency prediction method for SOI is proposed and the corresponding CMP-based saliency model for SOI is established in this paper, its framework is shown in Fig. 4. Firstly, an SOI is transformed from its ERP format to the CMP format, and its four horizontal faces as well as the top and bottom faces are obtained to form six faces of the CMP format. Then, to reduce the boundary effect of CMP-mapped SOI, the horizontal four faces are moved to obtain

Experimental results and discussions

In this section, using the toolbox in [38], the proposed model is tested on the public ODI database for SOI. At the same time, extensive experiments are also carried out to verify the effectiveness of the proposed model with comparison of other state-of-the-art saliency models.

Conclusion

In this work, a saliency model based on cube map projection (CMP) for stereoscopic omnidirectional image has been proposed. The proposed model consists of two parts: horizontal face and vertical face saliency prediction. Specifically, horizontal face saliency map is obtained by weighting the horizontal global face saliency map to the horizontal local face saliency map; vertical face saliency map is obtained by the foreground priori method which uses the obtained face V2 saliency map as the

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported by the Natural Science Foundation of China under Grant Nos. 61671258, 61871247, and 61931022. It was also sponsored by the K.C. Wong Magna Fund of Ningbo University.

Junjun Zhang was born in 1995. She is currently working towards the M.E. degree at Ningbo University. Her research interests are visual saliency and visual comfort assessment.

References (38)

  • P. Lebreton et al.

    GBVS360, BMS360, ProSal: extending existing saliency prediction models from 2D to omnidirectional images

    Signal Process. Image Commun.

    (2018)
  • Y. Zhu et al.

    The prediction of head and eye movement for 360 degree images

    Signal Process. Image Commun.

    (2018)
  • F. Battisti et al.

    A feature-based approach for saliency estimation of omni-directional images

    Signal Process. Image Commun.

    (2018)
  • R.J. Peters et al.

    Components of bottom-up gaze allocation in natural images

    Vis. Res.

    (2005)
  • J. Gutiérrez et al.

    Toolbox and dataset for the development of saliency and scanpath models for omnidirectional/360° still images

    Signal Process. Image Commun.

    (2018)
  • B. Luo et al.

    Parallax360: stereoscopic 360° scene representation for head-motion parallax

    IEEE Trans. Vis. Comput. Graph.

    (2018)
  • ISO/IEC FDIS 23090-12:201x(E), Information technology: coded representation of immersive media (MPEG-I) — Part 2:...
  • K. Matzen et al.

    Low-cost 360 stereo photography and video capture

    ACM Trans. Graph.

    (2017)
  • C. Yang et al.

    Saliency detection via graph-based manifold ranking

    IEEE Conf. Comput. Vis. Pattern Recognit.

    (2013)
  • Cited by (0)

    Junjun Zhang was born in 1995. She is currently working towards the M.E. degree at Ningbo University. Her research interests are visual saliency and visual comfort assessment.

    Mei Yu received the B.S. and M.S. degrees from Hangzhou Institute of Electronics Engineering, China, in 1990 and 1993, and the Ph.D. degree from Ajou University, South Korea, in 2000. She is currently a professor with the Faculty of Information Science and Engineering, Ningbo University, China. Her research interests mainly include image video coding and visual perception.

    Gangyi Jiang received the M.S. degree from Hangzhou University, China, in 1992 and the Ph.D. degree from Ajou University, South Korea, in 2000. He is currently a professor with the Faculty of Information Science and Engineering, Ningbo University, China. He is a senior member of the IEEE. He has authored over 100 technical articles in refereed journals. His research interests mainly include visual communication and image video processing, 3D video coding, omnidirectional video processing, light field imaging, visual perception and quality assessment.

    Yubin Qi received the B.S. degree from Ningbo University, Ningbo, China, in 2017. He is currently pursuing the M.S. degree at Ningbo University. His current research interest lies in perceptual image processing.

    View full text