Stereo superpixel: An iterative framework based on parallax consistency and collaborative optimization
Introduction
Superpixel segmentation aims to replace discretized pixels with homogeneous units as input primitives [1], which benefits to largely reduce the computation complexity and information redundancy, and is widely used in various tasks and industrial applications, e.g., saliency detection [2], [3], [4], object tracking [5], scene reconstruction [6], image enhancement [7], [8], [9], image data-hiding [10], [11], [12], [13], [14], image steganography [15], [16], [17], roughened surfaces detection [18], hydraulic-fracturing applications [19], remote sensing [20], and RGB-D salient object detection [21], [22], [23], [24], [25], [26]. In recent years, dual-camera system becomes more and more popular, which has been widely used in mobile phones and autonomous vehicles. Moreover, it turns out that stereo image pairs have better consistency with human perception scheme than a single image, and the information from the two views are complementary and correlative, which is conducive to scene representation and object modeling. However, the task of superpixel segmentation for stereo image pairs is a challenging new proposition, because the information consistency and difference between two viewpoints need to be considered jointly.
Superpixel segmentation of single image has been studies for years and many state-of-the-art methods have been proposed. According to the methodology, the existing superpixel segmentation algorithms can be roughly divided into two categories: the approaches based on graph theory, and the approaches based on clustering. Considering the image topological structure, topology preserved regular superpixel (TPS) [27] is a representative graph-based algorithm, which aims to produce relatively regular superpixel and keep the topology to a certain extent. However, the regular topology destroys the boundaries and reduce accuracy. Recently, approximately structural superpixels (ASS) [28] is proposed to generate approximately structural superpixels by an asymmetrically square-wise superpixel segmentation way, which can largely reduce data amount as well as preserve image content boundaries. Simple linear iterative clustering superpixel (SLIC) [29] is a representative clustering-based method, which is widely applied in various tasks of computer vision due to its efficiency. It adopts K-means clustering approach and changes the search area into a local area rather than the global searching in K-means, which largely reduces the computation complexity. Fast linear iterative clustering (FLIC) [30] is an active search method emphasizing the neighboring continuity to improve the segmentation accuracy, which is an extension of SLIC. Then, linear spectral clustering (LSC) [31] generates superpixels by mapping pixels into a high-dimensional feature space through a kernel function to segment the image more accurately. Different from the conventional methods, Superpixel Hierarchy (SH) [32] is recently proposed to represent the images with the reconstruction of superpixels’ average values, which can show the quality of multi-scale superpixels intuitively. Later, Bayesian Adaptive Superpixel Segmentation (BASS) [33] generates superpixels by a Bayesian mixture model, and also adopts superpixel reconstruction to show the segmentation quality.
For the superpixel segmentation of stereo image pairs, directly segmenting the left and right images separately with appropriate aforementioned single-image segmentation method is mostly adopted in the applications [34], [35]. However, this approach cannot be considered as an effective and real implementation of stereo superpixel segmentation, because the correspondence between the two views is ignored, and it is not helpful for the subsequent left-right view collaborative processing tasks. Fig. 1 shows a simple comparison of separately segmenting stereo image pairs and collaborative superpixel segmentation by the proposed framework. We can see that collaboratively segmentation is more consistent with human perception compared with separately segmentation. Therefore, it is necessary to take the left-right view’s collaborative relationship into consideration, and investigate a specialized method for the stereo superpixel segmentation.
As mentioned earlier, correspondence between stereo image pairs plays important role in the stereo superpixel segmentation, and it builds a bridge for modeling the left-right view relationships. To obtain the correspondence between a stereo image pair, matching is an effective and intuitive solution. Stereo matching methods have been investigated for many years to explore the correspondence between stereo image pairs [36], [37], [38]. These methods use different pixel-wise search for finding the nearest-neighbor matches between images. However, these methods capture the correspondence from pixel to pixel, which suffer from high computational cost and are not efficient for superpixel matching. Therefore, patch matching [37] combined with superpixels [39], [40] is proposed as an evolutionary version. These methods match the superpixels in multiple images by using pixel-wise moves to find the most approximate matches, similar to patch matching. Subsequently, superpatch matching [41] is proposed to correlate superpixels by patch-wise matching from multi-images. Nevertheless, these methods do not take into account the occlusion caused by drastic disparity variation in stereo images. When matching, these methods select for each superpixel the most similar superpixel in another image, which may bring about matching error since the superpixels ought to be not matched in the occlusion areas. Therefore, fully considering the occlusion relationship to effectively model the correspondence is one of the problems to be solved for the stereo superpixel segmentation task.
In order to address these problems, we propose an iterative superpixel segmentation framework for stereo image pairs based on interactive parallax consistency in this paper, where the occlusion-aware superpixel matching and left-right interactive optimization are designed. First, the existing superpixel segmentation method (e.g., SLIC [29], LSC [31]) is independently applied to the left and right images and generate the corresponding initial superpixels. Then, considering the parallax relationship between the left and right views, the image is divided into paired regions and non-paired regions. In this way, it can be used to determine the occlusion status, and more reasonably modeled in subsequent processing. Based on the region division, we integrate two modules in an iterative update framework, i.e., superpixel matching and collaborative optimization. The superpixel matching process mainly performs local search and matching on the paired regions, while retaining the non-paired regions without any processing, and then generating a matching relation matrix, which can effectively alleviate the matching errors caused by occlusion. The collaborative optimization process designs an energy function to coordinately refine the matched superpixels of the left and right images in an interactive manner, and generate more homogeneous and sharp-boundary superpixel.
The main contributions of the proposed work are concluded as follows:
- •
We propose a superpixel segmentation framework for stereo image pairs based on left-right interactive parallax consistency. The framework can generate stereo superpixels by integrating the mutual information from both left and right views, and also can convert the conventional single image superpixel segmentation methods to produce stereo superpixels for the first time, to the best of our knowledge.
- •
We divide the images into paired region and non-paired region, and generate a matching relation matrix of stereo left-right superpixels by a local search based matching model, considering the occlusion caused by drastic disparity variation in stereo images.
- •
We propose a collaborative optimization scheme to coordinately refine the matched superpixels of the left and right images in an interactive manner, and generate more accurate and perceptual superpixel.
- •
Extensive experiments demonstrate that for stereo superpixel segmentation, the proposed collaborative optimization framework achieves superior performance compared with single-image superpixel segmentation both quantitatively and qualitatively.
Section snippets
Pipeline
The framework is illustrated in Fig. 2. As shown in Fig. 2, the proposed stereo superpixel framework includes three main components, i.e., disparity guided region division, superpixel matching via local searching, and left-right collaborative optimization. According to the parallax relationship between the left and right images, we first divide the left and right images into paired regions and non-paired regions jointly. Then, we initialize the superpixels using a superpixel segmentation method
Datasets and parameters selection
Since there is no standard ground truth of superpixel segmentation for stereo image, we choose a dataset with disparity ground truth and another dataset without any ground truth to evaluate our method extensively. To compare the experimental results fairly, we adopt the following publicly stereo image datasets: the Middlebury stereo dataset [43] and the Flicker datasets [38]. Middlebury includes 21 image series with 7 views taken under three different illuminations, and with three different
Conclusion
In this paper, we have proposed a superpixel segmentation framework for stereo image pairs based on interactive parallax consistency for the first time. Considering the parallax relationship between the left and right views, the occlusion-aware superpixel matching and left-right interactive optimization are designed, instead of segmenting the stereo image pairs separately in the conventional approach. The superpixel matching process generates a matching matrix in paired region to alleviate the
CRediT authorship contribution statement
Hua Li: Conceptualization, Methodology, Software. Runmin Cong: Data curation, Conceptualization, Methodology. Sam Kwong: Supervision, Writing - review & editing. Chuanbo Chen: Supervision, Writing - review & editing. Qianqian Xu: Writing - review & editing. Chongyi Li: Writing - review & editing.
Acknowledgment
This work was supported by the Key Project of Science and Technology Innovation 2030 supported by the Ministry of Science and Technology of China under Grant 2018AAA0101301, in part by the Natural Science Foundation of China under Grants 61772344, 62002014, in part by the Hong Kong RGC General Research Funds under 9042816 (CityU 11209819), in part by the Beijing Nova Program under Grant Z201100006820016, in part by the Fundamental Research Funds for the Central Universities under Grant
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (46)
- et al.
Mcch: a novel convex hull prior based solution for saliency detection
Inf. Sci.
(2019) - et al.
Object tracking under large motion: combining coarse-to-fine search with superpixels
Inf. Sci.
(2019) - et al.
Spatiotemporal road scene reconstruction using superpixel-based markov random field
Inf. Sci.
(2020) - et al.
Hyperspectral image denoising with superpixel segmentation and low-rank representation
Inf. Sci.
(2017) - et al.
Cross-trees, edge and superpixel priors-based cost aggregation for stereo matching
Pattern Recogn.
(2015) - et al.
Complementary saliency driven co-segmentation with region searching and hierarchical constraint
Inf. Sci.
(2016) - et al.
Regularity preserved superpixels and supervoxels
IEEE Trans. Multimedia
(2014) - et al.
Review of visual saliency detection with comprehensive information
IEEE Trans. Circuits Syst. Video Technol.
(2019) - et al.
Video saliency detection via sparsity-based reconstruction and propagation
IEEE Trans. Image Process.
(2019) - et al.
Underwater image enhancement by dehazing with minimum information loss and histogram distribution prior
IEEE Trans. Image Process.
(2016)