A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images

https://doi.org/10.1016/j.isprsjprs.2020.06.003Get rights and content

Abstract

Change detection in high resolution remote sensing images is crucial to the understanding of land surface changes. As traditional change detection methods are not suitable for the task considering the challenges brought by the fine image details and complex texture features conveyed in high resolution images, a number of deep learning-based change detection methods have been proposed to improve the change detection performance. Although the state-of-the-art deep feature based methods outperform all the other deep learning-based change detection methods, networks in the existing deep feature based methods are mostly modified from architectures that are originally proposed for single-image semantic segmentation. Transferring these networks for change detection task still poses some key issues. In this paper, we propose a deeply supervised image fusion network (IFN) for change detection in high resolution bi-temporal remote sensing images. Specifically, highly representative deep features of bi-temporal images are firstly extracted through a fully convolutional two-stream architecture. Then, the extracted deep features are fed into a deeply supervised difference discrimination network (DDN) for change detection. To improve boundary completeness and internal compactness of objects in the output change maps, multi-level deep features of raw images are fused with image difference features by means of attention modules for change map reconstruction. DDN is further enhanced by directly introducing change map losses to intermediate layers in the network, and the whole network is trained in an end-to-end manner. IFN is applied to a publicly available dataset, as well as a challenging dataset consisting of multi-source bi-temporal images from Google Earth covering different cities in China. Both visual interpretation and quantitative assessment confirm that IFN outperforms four benchmark methods derived from the literature, by returning changed areas with complete boundaries and high internal compactness compared to the state-of-the-art methods.

Introduction

Change detection aims to identify differences in multi-temporal images of the same area. Monitoring differences in bi-temporal remotely sensed images is crucial to the understanding of land surface changes. Using remote sensing images for change detection has been widely applied for various applications, such as disaster damage assessment, land cover mapping, and urban expansion investigation (Jin et al., 2013, Mundia and Aniya, 2005, Wang and Xu, 2010). With the development of high resolution optical sensors (e.g., WorldView-3, GeoEys-1, QuickBird, and Gaofen-2), the increasing availability of high resolution remote sensing images has widened the range of potential applications of change detection in high resolution bi-temporal images.

Studies on change detection have been carried out for decades in the remote sensing community. Traditional change detection methods can be broadly categorized into three classes: 1) image arithmetical-based, 2) image transformation-based, and 3) post classification methods. Image arithmetical-based methods directly compare pixel values from multi-temporal images to produce image difference maps upon which thresholds are applied to classify pixels into changed class or unchanged class. Arithmetical operations, such as image differencing (Singh, 1986), image regression (Jackson, 1983), and image rationing (Todd, 1977), are typically used for image comparison. The key of image arithmetical-based methods is the decision of where to place the threshold boundaries to separate changed pixels from unchanged pixels (Singh, 1989). Recently, some other machine learning-based methods such as random forest regression, support vector machine, and kernel regression, have been proposed for remote sensing image change detection (Zerrouki et al., 2018, Luppino et al., 2019, Padron-Hidalgo et al., 2019). For example, Luppino et al. (2019) firstly utilize affinity matrices to create pseudo training data from co-located patches. Then, four different machine learning algorithms are tested over the pseudo training data to transform the domain of one image to the other domain to realize heterogeneous change detection. Image transformation-based methods transform image spectral combinations into a specific feature space to discriminate changed pixels. Principal Component Analysis (PCA) is one of the most widely used algorithms in the image transformation-based methods for dimensionality reduction (Kuncheva and Faithfull, 2014). Saha et al. (2019) propose to use cycle-consistent Generative Adversarial Network (CycleGAN) to trans-code images from different sensors into the same domain in an unsupervised way, and further realize change detection through deep feature change vector analysis. Since pixel-based analysis neglect spatial contextual information, object-based change detection methods are proposed. The main idea of object-based methods is to extract features from segmented image-objects and identify changes in the state of objects (Chen et al., 2012). In the work of Celik (2009), PCA is applied on image difference maps to extract representative features from objects. In post classification methods, bi-temporal images are independently classified and labeled. Changed areas are extracted through a direct comparison of the classification results (Wu et al., 2017). Arithmetical-based and transformation-based methods are the typical unsupervised methods. To improve the performance of change detection, many scholars treat change detection as a problem of explicitly finding land-cover transitions in a supervised way. The most popular supervised method is post classification. Post classification methods bypass the difficulties in change detection from raw images at different times. However, these methods are highly sensitive to the classification results (Deng et al., 2008). Arithmetical-based and transformation-based methods are highly dependent on the empirically designed algorithms for discriminative feature extraction, which fail to achieve satisfying results on high resolution images. Moreover, errors generated by the derivation of difference images in pixel-based methods, the uncertainty of segmented objects in object-based methods, and the misclassification errors of bi-temporal images in post classification methods are inevitably propagated through different stages of change detection and in the end affect negatively the results. The fine image details and complex texture features conveyed in high resolution images introduce new challenges for the change detection task. This has led to the rising of deep learning-based change detection methods. In pixel-based and object-based methods, deep features of pixels or objects are firstly extracted through deep learning techniques such as deep belief network, stacked denoising autoencoder and convolutional neural network (CNN) (El Amin et al., 2017; Lei et al., 2019b; Zhang et al., 2016). Afterward, difference images or change vectors are generated by deep feature comparison. The final change maps are produced by clustering or threshold-based classification methods. Benefiting from the strong high level feature extraction ability of deep neural networks, these methods achieve superior performances than traditional methods. However, the error propagation problem still exists in pixel-based and object-based methods. , Deep feature based methods implemented with fully convolutional network (FCN) are proposed (Alcantarilla et al., 2018, Bromley et al., 1994, Caye Daudt et al., 2018, Daudt et al., 2018, Peng et al., 2019, Shelhamer et al., 2017). Deep feature based methods transform bi-temporal images into high-level spaces and take deep features as analysis unit. These methods integrate feature extraction and difference discrimination operation within the networks to directly produce the final change maps in an end-to-end manner. It should be noted that most of these networks are modified from networks that are proposed for single-image semantic segmentation. Transferring these networks for change detection in bi-temporal images often comes across with some crucial problems including the lack of informative deep features of individual raw images in early-fusion methods, the low representative raw image features in late-fusion methods and the heterogeneous feature fusion problems.

In this paper, we present a deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. Firstly, highly representative deep features of bi-temporal images are extracted in parallel through a fully convolutional two-stream architecture. Then, the extracted features are sequentially fed into the difference discrimination network for change detection. Attention modules are exploited in the difference discrimination network for effective fusion of raw image deep features and image difference features to help change map reconstruction. Moreover, to further improve the network performance, deep supervision is proposed by directly introducing change detection losses in intermediate layers in the difference discrimination network.

The rest of the paper is organized as follows: Section 2 reviews the current deep learning-based change detection methods. Problem statements and the proposed solutions are also presented in this section. Section 3 presents the proposed methodology. Experiments and discussions are given in Section 4. Section 5 concludes the paper.

Section snippets

Related work and problem statement

After a broad review in the literature, in Section 2.1, we classify the deep learning-based change detection methods into three categories based on the analysis unit: 1) pixel-based methods, 2) object-based methods, and 3) deep feature based methods. Afterward, problems of existing deep learning-based methods are stated. In order to frame our work within the state-of-the-art, we focus our research on deep feature based methods. Therefore, in Section 2.2, problems of existing deep feature based

Methodology

The proposed network architecture is presented in Section 3.1. In Section 3.2 and 3.3, we present the key network components, i.e. the attention modules and deep supervisions, respectively. Section 3.4 provides details of model training, including data augmentation process, training process, and loss function.

Datasets

Two datasets are utilized in the experiments for comprehensive benchmark comparison. The first dataset is released by Lebedev et al. (2018). Tested on this dataset, a modified Unet++ architecture achieved the best performance (Peng et al., 2019). The second dataset is a challenging dataset that is manually collected from Google Earth. Different from the literature that train and test model with images covering the same area, we train and test the model with the second dataset covering different

Conclusions

In this paper, we explicitly explore the mechanisms and point out the key limitations of the state-of-the-art deep learning-based change detection methods, including the early-fusion and late-fusion architectures. Rather than simply propose a modified network based on the existed architectures, we analyze the reason behind the problem and propose a deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. Feature extraction of bi-temporal

Acknowledgements

We appreciate the reviewers and editors for their constructive comments that helped improve the quality of the paper. The work was supported by Major State Research Development Program of China (No. 2017YFB0504103), National Natural Science Foundation of China (No. 41722109), Hubei Provincial Natural Science Foundation of China (No. 2018CFA053), and Wuhan Yellow Crane Talents (Science) Program (2016).

Declaration of Interest Statement

Each of the authors confirms that no part of this manuscript has been previously published, nor is any part is currently under consideration by any other journal. Additionally, each of the authors has approved the contents of this paper and have agreed to the submission policies of ISPRS Journal of Photogrammetry and Remote Sensing.

References (43)

  • El Amin, A.M., Liu, Q., Wang, Y., 2017. Zoom out CNNs features for optical remote sensing change detection, in: 2017...
  • X. Glorot et al.

    Understanding the difficulty of training deep feedforward neural networks

    Proceedings of the thirteenth international conference on artificial intelligence and statistics.

    (2010)
  • X. Glorot et al.

    Deep sparse rectifier neural networks

    Journal of Machine Learning Research.

    (2011)
  • Guo, E., Fu, X., Zhu, J., Deng, M., Liu, Y., Zhu, Q., Li, H., 2018. Learning to Measure Change: Fully Convolutional...
  • K. He et al.

    Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

    The IEEE International Conference on Computer Vision.

    (2015)
  • B. Hou et al.

    Change Detection Based on Deep Features and Low Rank

    IEEE Geosci. Remote Sens. Lett.

    (2017)
  • Jia Deng, Wei Dong, Socher, R., Li-Jia Li, Kai Li, Li Fei-Fei, 2009. ImageNet: A large-scale hierarchical image...
  • L.I. Kuncheva et al.

    PCA feature extraction for change detection in multidimensional unlabeled data

    IEEE Trans. Neural Networks Learn. Syst.

    (2014)
  • M.A. Lebedev et al.

    Change detection in remote sensing images using conditional adversarial networks. ISPRS - Int. Arch. Photogramm. Remote Sens. Spat

    Inf. Sci.

    (2018)
  • C.Y. Lee et al.

    Deeply-supervised nets

    Artificial intelligence and statistics.

    (2015)
  • T. Lei et al.

    Landslide Inventory Mapping From Bitemporal Images Using Deep Convolutional Neural Networks

    IEEE Geosci. Remote Sens. Lett.

    (2019)
  • Cited by (545)

    • Change detection with incorporating multi-constraints and loss weights

      2024, Engineering Applications of Artificial Intelligence
    • Robust change detection for remote sensing images based on temporospatial interactive attention module

      2024, International Journal of Applied Earth Observation and Geoinformation
    • ScribbleCDNet: Change detection on high-resolution remote sensing imagery with scribble interaction

      2024, International Journal of Applied Earth Observation and Geoinformation
    View all citing articles on Scopus
    View full text