Elsevier

Information Sciences

Volume 556, May 2021, Pages 209-222
Information Sciences

Stereo superpixel: An iterative framework based on parallax consistency and collaborative optimization

https://doi.org/10.1016/j.ins.2020.12.031Get rights and content

Abstract

Stereo superpixel segmentation aims to obtain the superpixel segmentation results of the left and right views more cooperatively and consistently, rather than simply performing independent segmentation directly. Thus, the correspondence between two views should be reasonably modeled and fully considered. In this paper, we propose a left-right interactive optimization framework for stereo superpixel segmentation. Considering the disparity in stereo image pairs, we first divide the images into paired region and non-paired region, and propose a collaborative optimization scheme to coordinately refine the matched superpixels of the left and right views in an interactive manner. This is, to the best of our knowledge, the first attempt to generate stereo superpixels considering the parallax consistency. Quantitative and qualitative experiments demonstrate that the proposed framework achieves superior performance in terms of consistency and accuracy compared with single-image superpixel segmentation.

Introduction

Superpixel segmentation aims to replace discretized pixels with homogeneous units as input primitives [1], which benefits to largely reduce the computation complexity and information redundancy, and is widely used in various tasks and industrial applications, e.g., saliency detection [2], [3], [4], object tracking [5], scene reconstruction [6], image enhancement [7], [8], [9], image data-hiding [10], [11], [12], [13], [14], image steganography [15], [16], [17], roughened surfaces detection [18], hydraulic-fracturing applications [19], remote sensing [20], and RGB-D salient object detection [21], [22], [23], [24], [25], [26]. In recent years, dual-camera system becomes more and more popular, which has been widely used in mobile phones and autonomous vehicles. Moreover, it turns out that stereo image pairs have better consistency with human perception scheme than a single image, and the information from the two views are complementary and correlative, which is conducive to scene representation and object modeling. However, the task of superpixel segmentation for stereo image pairs is a challenging new proposition, because the information consistency and difference between two viewpoints need to be considered jointly.

Superpixel segmentation of single image has been studies for years and many state-of-the-art methods have been proposed. According to the methodology, the existing superpixel segmentation algorithms can be roughly divided into two categories: the approaches based on graph theory, and the approaches based on clustering. Considering the image topological structure, topology preserved regular superpixel (TPS) [27] is a representative graph-based algorithm, which aims to produce relatively regular superpixel and keep the topology to a certain extent. However, the regular topology destroys the boundaries and reduce accuracy. Recently, approximately structural superpixels (ASS) [28] is proposed to generate approximately structural superpixels by an asymmetrically square-wise superpixel segmentation way, which can largely reduce data amount as well as preserve image content boundaries. Simple linear iterative clustering superpixel (SLIC) [29] is a representative clustering-based method, which is widely applied in various tasks of computer vision due to its efficiency. It adopts K-means clustering approach and changes the search area into a local area rather than the global searching in K-means, which largely reduces the computation complexity. Fast linear iterative clustering (FLIC) [30] is an active search method emphasizing the neighboring continuity to improve the segmentation accuracy, which is an extension of SLIC. Then, linear spectral clustering (LSC) [31] generates superpixels by mapping pixels into a high-dimensional feature space through a kernel function to segment the image more accurately. Different from the conventional methods, Superpixel Hierarchy (SH) [32] is recently proposed to represent the images with the reconstruction of superpixels’ average values, which can show the quality of multi-scale superpixels intuitively. Later, Bayesian Adaptive Superpixel Segmentation (BASS) [33] generates superpixels by a Bayesian mixture model, and also adopts superpixel reconstruction to show the segmentation quality.

For the superpixel segmentation of stereo image pairs, directly segmenting the left and right images separately with appropriate aforementioned single-image segmentation method is mostly adopted in the applications [34], [35]. However, this approach cannot be considered as an effective and real implementation of stereo superpixel segmentation, because the correspondence between the two views is ignored, and it is not helpful for the subsequent left-right view collaborative processing tasks. Fig. 1 shows a simple comparison of separately segmenting stereo image pairs and collaborative superpixel segmentation by the proposed framework. We can see that collaboratively segmentation is more consistent with human perception compared with separately segmentation. Therefore, it is necessary to take the left-right view’s collaborative relationship into consideration, and investigate a specialized method for the stereo superpixel segmentation.

As mentioned earlier, correspondence between stereo image pairs plays important role in the stereo superpixel segmentation, and it builds a bridge for modeling the left-right view relationships. To obtain the correspondence between a stereo image pair, matching is an effective and intuitive solution. Stereo matching methods have been investigated for many years to explore the correspondence between stereo image pairs [36], [37], [38]. These methods use different pixel-wise search for finding the nearest-neighbor matches between images. However, these methods capture the correspondence from pixel to pixel, which suffer from high computational cost and are not efficient for superpixel matching. Therefore, patch matching [37] combined with superpixels [39], [40] is proposed as an evolutionary version. These methods match the superpixels in multiple images by using pixel-wise moves to find the most approximate matches, similar to patch matching. Subsequently, superpatch matching [41] is proposed to correlate superpixels by patch-wise matching from multi-images. Nevertheless, these methods do not take into account the occlusion caused by drastic disparity variation in stereo images. When matching, these methods select for each superpixel the most similar superpixel in another image, which may bring about matching error since the superpixels ought to be not matched in the occlusion areas. Therefore, fully considering the occlusion relationship to effectively model the correspondence is one of the problems to be solved for the stereo superpixel segmentation task.

In order to address these problems, we propose an iterative superpixel segmentation framework for stereo image pairs based on interactive parallax consistency in this paper, where the occlusion-aware superpixel matching and left-right interactive optimization are designed. First, the existing superpixel segmentation method (e.g., SLIC [29], LSC [31]) is independently applied to the left and right images and generate the corresponding initial superpixels. Then, considering the parallax relationship between the left and right views, the image is divided into paired regions and non-paired regions. In this way, it can be used to determine the occlusion status, and more reasonably modeled in subsequent processing. Based on the region division, we integrate two modules in an iterative update framework, i.e., superpixel matching and collaborative optimization. The superpixel matching process mainly performs local search and matching on the paired regions, while retaining the non-paired regions without any processing, and then generating a matching relation matrix, which can effectively alleviate the matching errors caused by occlusion. The collaborative optimization process designs an energy function to coordinately refine the matched superpixels of the left and right images in an interactive manner, and generate more homogeneous and sharp-boundary superpixel.

The main contributions of the proposed work are concluded as follows:

  • We propose a superpixel segmentation framework for stereo image pairs based on left-right interactive parallax consistency. The framework can generate stereo superpixels by integrating the mutual information from both left and right views, and also can convert the conventional single image superpixel segmentation methods to produce stereo superpixels for the first time, to the best of our knowledge.

  • We divide the images into paired region and non-paired region, and generate a matching relation matrix of stereo left-right superpixels by a local search based matching model, considering the occlusion caused by drastic disparity variation in stereo images.

  • We propose a collaborative optimization scheme to coordinately refine the matched superpixels of the left and right images in an interactive manner, and generate more accurate and perceptual superpixel.

  • Extensive experiments demonstrate that for stereo superpixel segmentation, the proposed collaborative optimization framework achieves superior performance compared with single-image superpixel segmentation both quantitatively and qualitatively.

Section snippets

Pipeline

The framework is illustrated in Fig. 2. As shown in Fig. 2, the proposed stereo superpixel framework includes three main components, i.e., disparity guided region division, superpixel matching via local searching, and left-right collaborative optimization. According to the parallax relationship between the left and right images, we first divide the left and right images into paired regions and non-paired regions jointly. Then, we initialize the superpixels using a superpixel segmentation method

Datasets and parameters selection

Since there is no standard ground truth of superpixel segmentation for stereo image, we choose a dataset with disparity ground truth and another dataset without any ground truth to evaluate our method extensively. To compare the experimental results fairly, we adopt the following publicly stereo image datasets: the Middlebury stereo dataset [43] and the Flicker datasets [38]. Middlebury includes 21 image series with 7 views taken under three different illuminations, and with three different

Conclusion

In this paper, we have proposed a superpixel segmentation framework for stereo image pairs based on interactive parallax consistency for the first time. Considering the parallax relationship between the left and right views, the occlusion-aware superpixel matching and left-right interactive optimization are designed, instead of segmenting the stereo image pairs separately in the conventional approach. The superpixel matching process generates a matching matrix in paired region to alleviate the

CRediT authorship contribution statement

Hua Li: Conceptualization, Methodology, Software. Runmin Cong: Data curation, Conceptualization, Methodology. Sam Kwong: Supervision, Writing - review & editing. Chuanbo Chen: Supervision, Writing - review & editing. Qianqian Xu: Writing - review & editing. Chongyi Li: Writing - review & editing.

Acknowledgment

This work was supported by the Key Project of Science and Technology Innovation 2030 supported by the Ministry of Science and Technology of China under Grant 2018AAA0101301, in part by the Natural Science Foundation of China under Grants 61772344, 62002014, in part by the Hong Kong RGC General Research Funds under 9042816 (CityU 11209819), in part by the Beijing Nova Program under Grant Z201100006820016, in part by the Fundamental Research Funds for the Central Universities under Grant

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (46)

  • C. Guo et al.

    Hierarchical features driven residual learning for depth map super-resolution

    IEEE Trans. Image Process.

    (2019)
  • F.S. Hassan et al.

    Efficient reversible data hiding multimedia technique based on smart image interpolation

    Multimedia Tools Appl.

    (2020)
  • A. Gutub et al.

    Efficient implementation of multi-image secret hiding based on LSB and DWT steganography comparisons

    Arab. J. Sci. Eng.

    (2020)
  • F.S. Hassan, A. Gutub, Novel embedding secrecy within images utilizing an improved interpolation-based reversible data...
  • T. AlKhodaidi et al.

    Trustworthy target key alteration helping counting-based secret sharing applicability

    Arab. J. Sci. Eng.

    (2020)
  • A. Gutub, A. Al-Qurashi, Secure shares generation via m-blocks partitioning for counting-based secret sharing, J. Eng....
  • M.T. Parvez et al.

    Vibrant color image steganography using channel differences and secret data distribution

    Kuwait J. Sci. Eng.

    (2011)
  • N. Al-Juaid et al.

    Combining RSA and audio steganography on personal computers for enhancing security

    SN Appl. Sci.

    (2019)
  • A.A.-A. Gutub

    Pixel indicator technique for RGB image steganography

    J. Emerg. Technol. Web Intell.

    (2010)
  • B. Xiao et al.

    Effective thermal conductivity of porous media with roughened surfaces by Fractal-Monte Carlo simulations

    Fractals

    (2020)
  • G. Long et al.

    A perforation-erosion model for hydraulic-fracturing applications

    SPE Prod. Oper.

    (2018)
  • C. Li et al.

    Nested network with two-stream pyramid for salient object detection in optical remote sensing images

    IEEE Trans. Geosci. Remote Sens.

    (2019)
  • R. Cong et al.

    Going from RGB to RGBD saliency: a depth-guided transformation model

    IEEE Trans. Cybern.

    (2020)
  • Cited by (0)

    View full text