Machine learning-assisted region merging for remote sensing image segmentation

https://doi.org/10.1016/j.isprsjprs.2020.07.017Get rights and content

Abstract

With the increasing popularity of OBIA, many scholars advocate that image segmentation plays a significant role in remote sensing image processing. Numerous segmentation algorithms for remote sensing images are based on region merging. Although good improvement is achieved, their accuracy is still dependent on parameter settings, leading to a low level of automation. To overcome this issue, this work proposes a new region merging method by using a random forest (RF) classifier. Unlike the traditional region merging methods that all adopt a scale threshold to determine whether a merging can be conducted, the new algorithm relies on a trained RF to decide the result of a merging test. Various merging criteria are simultaneously employed as feature variables of the RF model, enhancing the quality of the proposed scheme. The major problem in this work is how to train the RF classifier since the merging test samples need to be obtained in the iterative steps of a region merging process, which involves a huge number of human–computer interactions even for a small image. To simplify it, a sample collection strategy based on a set of three-scale segmentation results is devised. Representative merging test samples can be obtained by using this method. To validate the proposed technique, four Gaofen-2 images are used for training sample collection, and the most interesting result is that the samples extracted from one image can apply to others. Some images captured by Orbview-3, GeoEye-1, and Worldview-2 further indicate the robust performance of the new algorithm and the samples acquired in this work.

Introduction

Object-based image analysis (OBIA) (or geographic object-based image analysis, GEOBIA) is a useful technique for land cover classification based on various remote sensing image data (Hurskainen et al., 2019, Blaschke et al., 2014, Blaschke, 2010). Existent studies published during recent 2 decades have proved the success of OBIA in a wide range of applications, such as urban structure monitoring (Bialas et al., 2019, Pelizari et al., 2018, Cai et al., 2017), ecological wetlands surveying (Mahdianpari et al., 2019, Liu and Abd-Elrahman, 2018a, Niesterowicz and Stepinski, 2017), and crop type identification (Appice and Malerba, 2019, Luciano et al., 2019). The most intriguing feature of OBIA is that the image segment/object serves as the processing unit (Chen et al., 2018, Ye et al., 2018). This leads to two-fold advantages. First, segment enables the utilization of geometric and spatial contextual features, which are useful for the classification of human-affected areas, e.g. urban and agricultural landscapes, because, in such regions, most geo-objects have regular shapes, which provide useful cues for their recognition. Second, the salt-and-pepper noises, which frequently appear in pixel-based classification, are much less existent in the results of OBIA. This is because spurious and outlier pixels are grouped into their neighboring segments before classification is performed.

However, the performance of OBIA often suffers from bad-quality segments, the occurrence of which is due to image segmentation errors. To understand this phenomenon, there are 2 cases to be explained. The first one is over-segmentation error, which means a geo-object is partitioned into more than one segment. In this situation, the shape and contextual information cannot fully describe the target geo-object, and thus they will contribute to some classification confusion. The second one is under-segmentation error, which takes place when a segment contains more than one real geo-objects. This results in classification incorrectness, and some studies reported that it is the main error source of OBIA (Su, 2019a, Hossain and Chen, 2019, Liu and Xia, 2010). In many real applications, the two error types co-exist (Su and Zhang, 2017), spawning more complicated difficulties and failures for object-based classification. Accordingly, how to generate accurate image segmentation plays a key role in OBIA.

Although there are a great number of studies aiming to enhance the quality of remote sensing image segmentation, they are still not satisfactory since not a single existent segmentation approach can surely produce error-free segmentation results (Hossain and Chen, 2019). These methods are based on different models, and the most popular ones include Markov random field (Wang et al., 2017, Qin and Clausi, 2010, Yu and Clausi, 2008), conditional random field (Yang et al., 2018, Chai, 2017, Zhang et al., 2015), active contour (Braga et al., 2017, Yan and Roy, 2014), mean-shift (Liu et al., 2018a, Liu et al., 2018b; Michel et al., 2015, Ming et al., 2015), watershed (Gaetano et al., 2015, Tarabalka et al., 2010), and region merging (Wang et al., 2020, Su, 2019a, Su, 2019b, Zhang et al., 2019). Among these schemes, region merging is very popular since it is easy to implement, and it can produce good segmentation result for diverse image types, such as multi-spectral (Wang et al., 2018, Liu et al., 2018a; (Liu et al., 2018a, Liu et al., 2018b), hyper-spectral (Li et al., 2018, Golipour et al., 2016), and SAR (Ji et al., 2019, Yu et al., 2013) images. Another important reason is due to the great success achieved by the commercial software, eCognition, which contains a region merging-based segmentation module, multi-resolution segmentation (MRS) (Baatz and Schäpe, 2000). Most of the recent OBIA papers report that they rely on MRS to produce segmentation results. Some representative examples are (Kim et al., 2020, Bialas et al., 2019, Castro et al., 2018, Liu and Abd-Elrahman, 2018b). These studies further stimulate a large number of works with the objective of improving region merging-based segmentation algorithms (Hossain and Chen, 2019). By summarizing these works, there exist two research directions delineated in the following.

In the first direction, studies attempt to determine the optimal scale parameter for a region merging technique. Given the fact that region merging is a bottom-up process, in which merging of pixels or small segments forms larger segments, scale parameter determines whether a merging is feasible. In practice, if the merging cost of a pair of segments is higher than the scale threshold, the merging will not take place; otherwise, the algorithm merges the two segments. High scale values will lead to large-area segments, some of which may contain more than one geo-object, i.e., under-segmentation error. On the contrary, low scales will result in small segments, and many geo-objects are not completely singled out, i.e., over-segmentation error. How to appropriately set scale is a key issue for a region merging segmentation.

However, the manual tuning of scale is a trial-and-error process, which is often labor-intensive and time-consuming, and even for the same scene, different users may choose different values as the optimal scale (Wang et al., 2019, Xiong et al., 2018, Zhou et al., 2017). These problems motivate scholars to develop automatic and objective scale estimation approaches. Although there are a few supervised techniques suitable for scale selection (Su and Zhang, 2017, Novelli et al., 2017, Liu et al., 2012), they all require reference segments, the preparation of which is often difficult and may introduce subjectivity. From the perspective of operational applications, scale estimation methods should be unsupervised. Most of these algorithms use within segment homogeneity and between segment heterogeneity to quantify the appropriateness of a scale value. Johnson and Xie (2011) exploited area-weighted variance and global Moran’s I. They normalized the two metrics and then integrate them to indicate the goodness of a scale. Zhang et al. (2012) developed an unsupervised metric named “Z”, which accounts for global intra-segment homogeneity and inter-segment heterogeneity. Yang et al. (2014) designed a scale selection method based on the change rate of the spectral angle. As a subsequent study, Yang et al. (2015) employed the local peak of spectral variation as the indicator of optimal scale. Böck et al. (2017) improved Johnson and Xie’s method by overcoming the instability issue. A very recent study of Wang et al. (2019) is also an improved version of Johnson and Xie’s approach, but they used the local heterogeneity model and F1 measure to obtain better results.

The aforementioned methods can help users objectively identify the optimal scale, but they all share some shortcomings. First, different approaches may lead to inconsistent results due to the differences in the estimation, or selection principle (Shen et al., 2019). How to choose the appropriate scale determination strategy is not an easy task even for an expert-level user. Second, the output of these algorithms is only one or a set of scale values, which can only be used globally for the traditional region merging algorithms. For the regions where geo-objects have very different sizes, a scale used globally cannot produce the optimal result for some parts of the image. To overcome these problems, some recent studies try to propose scale-variable region merging strategies, which can be deemed as a combination of a region merging process and an unsupervised scale estimation technique. Note that in these methods, the scale selection or estimation approach needs to derive locally optimal scale values for different image parts. Yang et al. (2017) developed a local scale variable methodology, which relies on local spectral homogeneity to locally adapt the globally optimal scale. Su (2019a) proposed a similar method, but his technique adopts super-segments for local scale refinement. From a different perspective, Shen et al. (2019) attempted to choose the optimal segments from a series of multi-scale segmentation results. Although Shen’s method did not involve scale estimation, the proposed process of segment selection is quite similar to a local optimal scale identification.

Another issue for the methods about scale is that, aside from the scale threshold, there are also many other parameters having a significant impact on the quality of a region merging segmentation. So far, researchers have not paid much attention to the determination methods of such parameters. The overlook of them may lead to biased opinions on how to obtain the optimal segmentation result. For example, except for scale, the popular MRS algorithm takes shape and compactness coefficients as input, and the two parameters affect the region merging process in a much different way as compared to scale. Most of the existent studies ignore how to automatically and optimally identify the two coefficients, though a few studies mentioned this aspect (Schultz et al., 2015). A similar problem will always appear when a newly designed region merging approach has input parameter(s) other than scale.

In addition to the parameter or, more precisely, the scale-related efforts, the second research direction mainly focuses on the design of a new region merging algorithm. The key element of these works is to avoid inappropriate merging operations and encourage proper ones. The most common solution is to devise an effective metric to quantify the merging cost. Such metrics are generally known as the merging criterion. Most criteria consider spectral information, such as the band-mean square error (Tilton et al., 2012), which is an early metric and dates back to 1989 (Beaulieu and Goldberg, 1989). Spectral angle is also a representative example, and many mainstream region merging techniques have adopted it (Yang et al., 2017, Yang et al., 2015). Spectral heterogeneity is another frequently used metric, but it is often combined with shape heterogeneity, as suggested by the well-known MRS algorithm (Baatz and Schäpe, 2000). The merit of combining spectral and shape heterogeneity is that the resulted segments have more regular geometric forms, leading to a visually pleasing effect for the segmentation and classification result. But this may compromise segmentation accuracy. Aside from spectral and geometric features, spatial contextual information is also a very important factor. Yu and Clausi (2008) proposed a segmentation method based on Markov random field and region merging, in which the edge penalty serves as the spatial contextual cue to guide the region merging process. Zhang et al. (2013) developed a boundary-constrained region merging approach, the merging criterion of which is a combination of spectral, geometric, and edge strength information. In this approach, common boundary edge strength aims to reflect the inter-segment spatial contextual relationship. Except for the aforementioned algorithms, some scholars proposed region merging techniques of multiple phases, in which different phase uses different merging criterion. For example, Yu et al. (2013) established a SAR image segmentation method based on a 2-stage process. The first stage coarsely merges the super-pixels corresponding to homogeneous parts, and the second stage deals with the merging of near-boundary segments.

When devising a new region merging approach, the merging order also plays a crucial role in segmentation performance. Different from the merging criterion, which quantifies the appropriateness or cost of a merging, merging order refers to which pair of segments should be processed first or with more priority. Regardless of the effects introduced by the merging criterion, there are primarily 4 rules to guide merging order, including local fit, local best fit, local mutual best fit, and global (mutual) best fit. To save space, details of the 4 metrics are not presented here, interested readers are referred to (Baatz and Schäpe, 2000). However, it is crucial to know that the most strict rule is the global best fit, followed by local mutual best fit, local best fit, and local fit. A stricter rule has a higher chance to derive accurate segmentation, while it requires more computational resources. Recently, Zhang et al. (2019) explored the effects of local best fit, local mutual best fit, and global best fit on the seed region shift phenomenon, which can be recognized as the change of merging order. Their experiments indicated that the shift of merging order will affect the growth rate of segments, and frequent shifts triggered by local mutual best fit or global best fit will lead to evenly growing segments and better segmentation accuracy. Except for the 4 metrics, other types of explicit formulation are possible for merging order, but this category of research is quite sporadic. An example is Su (2017), who used intra-segment spectral variance and inter-segment edge strength to explicitly define a merging order model.

As mentioned above, there exists remarkable progress on the improvement of region merging segmentation. However, it is still far from satisfactory to achieve perfect performance by using the existing region merging methods. According to previous studies, 2 serious problems need to be considered to further improve segmentation performance. For one thing, parameter setting may rank the top since this issue not only affects segmentation accuracy to a large extent but also influences the automation degree and thus operational feasibility of image segmentation and OBIA. How to lower the sensitivity of a segmentation approach to its parameter set is always a critical but hard research line. The ideal situation is that the algorithm does not have input parameters to tune, and users can obtain satisfactory segmentation results by simply pushing the “run” button. This is attractive, especially from an operational perspective, but none of the current methods can achieve it.

For another, it is difficult to take the maximal advantage of various information, such as spectra, geometry, and spatial context, to improve the formulation of merging criterion or merging order. A linear model is the most common way to fuse diverse information into one metric for a region merging algorithm. The general form of such a model is m = α × m1 + β × m2, where m is an ensemble metric, m1 and m2 are two metrics derived by using different features, α and β are balancing factors that adjust the relative contribution of m1 and m2, respectively. Almost all of the aforementioned studies adopted this type of formulation (Zhang et al., 2019, Zhang et al., 2014, Su, 2017, Baatz and Schäpe, 2000). A region merging based on this model type cannot guarantee that different information can work optimally since the linear combination will always introduce confusion. For example, suppose m1 and m2 refer to spectral and geometric merging criteria, respectively. When considering the merging of two segments with dissimilar spectra (corresponding to a relatively high m1), if their m2 value is quite low, the approach may mistakenly merge the two segments due to m value being sufficiently low. Under such circumstances, the factors like α and β need to be tuned. However, in most real situations, it is highly abstract and difficult for users to understand the relationship of different information. Consequently, when proposing a linear model-based algorithm, it will face the problem of confusing the contribution of different features.

This paper aims to solve the problems of (1) parameter setting and (2) confusion of different information. The proposed technique is a machine-learning assisted region merging algorithm. Unlike the existent approaches, in which each merging takes place only if the merging cost is lower than the predefined scale, we formulate the merging process as a series of binary classification problems. To be more specific, a machine learning classifier tells if a merging is feasible or not. Furthermore, several merging criteria based on different sources of information serve as the input feature vector of the binary classifier. In this way, there is no need to set the scale and the balancing coefficients required by the linear model. The difficulty of such an algorithm is how to train the binary classifier, which decisively affects the segmentation performance. We provide a simple solution to this issue by using a set of 3-scale segmentation results. Section 2 has more details of the principle. Note that we choose random forest (RF) as the base classifier for algorithm development. In remote sensing, RF is a widely used machine learning technique and there is plenty of literature reporting the success of this model (Bialas et al., 2019, Goldblatt et al., 2018, Belgiu and Dragut, 2016). However, almost all of the previous studies applied RF to the traditional classification problems, none have explored the usage of RF in the sub-problems of image segmentation based on region merging.

Section snippets

Outline of the proposed algorithm

The proposed image segmentation technique is a region merging-based process. An illustration of the overall work-flow is in Fig. 1. To highlight the differences between the new algorithm and a traditional region merging approach, Fig. 1(b) also provides the overall process of the latter. As can be seen, most of the procedures are similar, except for 3 steps marked by the red edges.

The first difference is super-segmentation, which is mandatory for our algorithm, while for most of the traditional

Image data

In this work, we employ four multi-spectral images of high spatial resolution to conduct experiments. These images are all acquired by a Chinese remote sensing satellite, Gaofen-2, equipped with two cameras which respectively capture panchromatic and multi-spectral imagery. The panchromatic sensor has a spatial resolution of 1.0 m, while the other is 4.0 m. The two instruments allow to acquire clear and sharp images with abundant earth observation information, which has been used in various

Effects of super-pixel segmentation

Before testing the proposed segmentation approach, it is necessary to analyze how the quality of super-pixel influences the final segmentation performance. As mentioned in Section 3.2, the two parameters of the adopted super-pixel segmentation are l and Tspec. For l, it is varied from 5 to 25, with Tspec fixed at 30. The TE curves for this analysis are produced for T1 and T3, as can be seen in Fig. 8(a) and (c). TE rises with the increase of l, indicating that larger spacing is detrimental to

Discussion

We claim that the experimental results introduced in Section 4 are encouraging. These results not only demonstrate the superiority of the proposed algorithm but also indicates that MARM has great potential for the segmentation of other multispectral images. The new segmentation method is discussed in the following.

Based on the principle and validation experiment of the proposed MARM technique, there are mainly two factors affecting its performance. They are 1) the intrinsic randomness of RF; 2)

Conclusion

This article presents a novel segmentation algorithm for remote sensing images. The major contribution is to combine a random forest (RF) classifier with a region merging method, leading to mainly three merits. The first one is that parameters such as scale, and balancing coefficients such as the shape and compactness factors in the popular multi-resolution segmentation, are not needed in the new technique. This makes it quite simple and efficient for green hand users to obtain satisfactory

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is jointly supported by the National Natural Science Foundation of China, under grunt of 61701265 and 51620105003, and the Inner Mongolia Science Fund for Distinguished Young Scholars, under grant of 2019JQ06. The authors want to thank the three anonymous reviewers since their comments are very constructive and helpful for the improvement of this work. The authors also want to thank Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and

References (70)

  • M. Kim et al.

    Object-based landfast sea ice detection over West Antarctica using time series ALOS PALSAR data

    Remote Sens. Environ.

    (2020)
  • T. Liu et al.

    Multi-view object-based classification of wetland land covers using unmanned aircraft system images

    Remote Sens. Environ.

    (2018)
  • T. Liu et al.

    Deep convolutional neural network training enrichment using multi-view object-based analysis of Unmanned Aerial systems imagery for wetlands classification

    ISPRS J. Photogramm. Remote Sens.

    (2018)
  • Y. Liu et al.

    Discrepancy measures for selecting optimal combination of parameter values in object-based image analysis

    ISPRS J. Photogramm. Remote Sens.

    (2012)
  • A. Luciano et al.

    A generalized space-time OBIA classification scheme to map sugarcane areas at regional scale, using Landsat images time-series and the random forest algorithm

    Int. J. Appl. Earth Obs. Geoinformation.

    (2019)
  • D. Ming et al.

    Scale parameter selection by spatial statistics for GeOBIA: Using mean-shift based multi-scale segmentation as an example

    ISPRS J. Photogramm. Remote Sens.

    (2015)
  • J. Niesterowicz et al.

    Pattern-based, multi-scale segmentation and regionalization of EOSD land cover

    Int. J. Appl. Earth Obs. Geoinformation.

    (2017)
  • Y. Shen et al.

    Optimizing multiscale segmentation with local spectral heterogeneity measure for high resolution remote sensing images

    ISPRS J. Photogramm. Remote Sens.

    (2019)
  • T. Su et al.

    Local and global evaluation for remote sensing image segmentation

    ISPRS J. Photogramm. Remote Sens.

    (2017)
  • T. Su

    Scale-variable region-merging for high resolution remote sensing image segmentation

    ISPRS J. Photogramm. Remote Sens.

    (2019)
  • Y. Tarabalka et al.

    Segmentation and classification of hyperspectral images using watershed transformation

    Pattern Recogn.

    (2010)
  • N. Wang et al.

    Segmentation of large-scale remotely sensed images on a Spark platform: A strategy for handling massive image tiles with the MapReduce model

    ISPRS J. Photogramm. Remote Sens.

    (2020)
  • Y. Wang et al.

    Unsupervised segmentation parameter selection using the local spatial statistics for remote sensing image segmentation

    Int. J. Appl. Earth Obs. Geoinformation.

    (2019)
  • L. Yan et al.

    Automated crop field extraction from multi-temporal Web Enabled Landsat data

    Remote Sens. Environ.

    (2014)
  • J. Yang et al.

    Region merging using local spectral angle thresholds: A more accurate method for hybrid segmentation of remote sensing images

    Remote. Sens. Environ.

    (2017)
  • J. Yang et al.

    A multi-band approach to unsupervised scale parameter selection for multi-scale image segmentation

    ISPRS J. Photogramm. Remote Sens.

    (2014)
  • S. Ye et al.

    A review of accuracy assessment for object-based image analysis: from per-pixel to per-polygon approaches

    ISPRS J. Photogramm. Remote Sens.

    (2018)
  • X. Zhang et al.

    Another look on region merging procedure from seed region shift for high-resolution remote sensing image segmentation

    ISPRS J. Photogramm. Remote Sens.

    (2019)
  • X. Zhang et al.

    Hybrid region merging method for segmentation of high-resolution remote sensing images

    ISPRS J. Photogramm. Remote Sens.

    (2014)
  • X. Zhang et al.

    Boundary-constrained multi-scale segmentation method for remote sensing images

    ISPRS. J. Photogramm. Remote. Sens.

    (2013)
  • R. Achanta et al.

    SLIC: superpixels compared to state-of-the-art superpixel methods

    IEEE Trans. Pattern Anal. Mach. Intel.

    (2012)
  • M. Baatz et al.

    Multiresolution segmentation - an optimization approach for high quality multi-scale image segmentation

    Angew. Geographische Informat. Verarbeitung

    (2000)
  • J. Beaulieu et al.

    Hierarchy in picture segmentation: a step-wise optimization approach

    IEEE Trans. Pattern Anal. Mach. Intel.

    (1989)
  • Blaschke, T., Hay, G.J., Kelly, M., Lang, S., Hofmann, P., Addink, E., Feitosa, R.Q., vander Meer, F., van der Werff,...
  • S. Böck et al.

    On the objectivity of the objective function - problems with unsupervised segmentation evaluation based on global score and a possible remedy

    Remote. Sens.

    (2017)
  • Cited by (22)

    • A modified fuzzy dual-local information c-mean clustering algorithm using quadratic surface as prototype for image clustering

      2022, Expert Systems with Applications
      Citation Excerpt :

      Each appropriate subset meets specific consistency conditions, highlights each part to meet the needs of subsequent target recognition and extraction (Wu & Cao, 2020; Choy, Ng & Yu, 2020; Jyoti & Meena, 2020). At present, image segmentation in the fields of biomedical engineering (Sun et al., 2020; Wang, Hu & Lyu,2020), remote sensing (Su, Liu, Zhang, Qu & Li, 2020; Carbonneau, Dugdale, Breckon, Dietrich & Woodget, 2020) and artificial intelligence (Marco, Domenico, Ritse & Loannis, 2020; Cirillo, Mirdell, Sjöberg &Pham, 2021) is a significant part. In the past few decades, scholars have proposed a series of image segmentation technologies, which involve variously supervised or unsupervised methods, for example, clustering (Zhao, Fan, Liu, Lan & Chen,2019), threshold processing (Almotiri, Elleithy & Elleithy, 2018), watershed (Gaetano, Masi, Poggi, Verdoliva & Scarpa,2015), region growth (Kang, Wang & Kang, 2012), neural network (Liu, et al, 2020), etc, However, so far, there is no general algorithm.

    View all citing articles on Scopus
    View full text