SceneNet: Remote sensing scene classification deep learning network using multi-objective neural evolution architecture search

https://doi.org/10.1016/j.isprsjprs.2020.11.025Get rights and content

Abstract

The scene classification approaches using deep learning have been the subject of much attention for remote sensing imagery. However, most deep learning networks have been constructed with a fixed architecture for natural image processing, and they are difficult to apply directly to remote sensing images, due to the more complex geometric structural features. Thus, there is an urgent need for automatic search for the most suitable neural network architecture from the image data in scene classification, in which a powerful search mechanism is required, and the computational complexity and performance error of the searched network should be balanced for a practical choice. In this article, a framework for scene classification network architecture search based on multi-objective neural evolution (SceneNet) is proposed. In SceneNet, the network architecture coding and searching are achieved using an evolutionary algorithm, which can implement a more flexible hierarchical extraction of the remote sensing image scene information. Moreover, the computational complexity and the performance error of the searched network are balanced by employing the multi-objective optimization method, and the competitive neural architectures are obtained in a Pareto solution set. The effectiveness of SceneNet is demonstrated by experimental comparisons with several deep neural networks designed by human experts.

Introduction

Scene classification refers to distinguishing the different semantic features of remote sensing images, which means that the different features and spatial distributions are reflected in the remote sensing images (Cheng et al., 2017a). Compared with image interpretation based on the pixel and object levels, scene level based image classification considers the different spatial distribution modes for the different objects in the high spatial resolution (HSR) remote sensing images (Xia et al., 2017, Zhu et al., 2018). There are various applications of scene classification, especially for land-use identification and urban planning (Cheng et al., 2015, Zhao et al., 2019). However, scene classification is still an arduous task due to the complex spatial and structural patterns of remote sensing images.

Scene classification has developed a lot in recent years. Compared with pixel-level classification and object detection, the semantic label information is emphasized in remote sensing scene classification, and can give more social semantic attributes (Zhao et al., 2016), such as airport, industrial area, commercial area, golf course, etc. Moreover, the scene classification methods can be classified into the three categories of low-level, middle-level, and high-level methods. The low-level features in scene classification include the color histogram (Hafner et al., 1995), local binary patterns (LBPs) (Ojala et al., 2002), and the gray-level co-occurrence matrix (GLCM) (Haralick et al., 1973). The middle-level methods, such as the bag-of-visual-words (BoVW) model, are an important way to extract the visual descriptors of the scenes (Zhu et al., 2016, Csurka et al., 2004). However, the high-level features are usually ignored in these handcrafted feature based traditional approaches.

Recently, deep learning based methods have played a vital role in extracting the high-level features, and have emerged as a dominant paradigm in pattern recognition and computer vision (Chen and Liu, 2018). In deep learning technology, convolutional neural networks (CNNs) can be regarded as the typical data-driven methods, and are powerful tools that can be used to discover the intricate structures and extract the essential information of HSR remote sensing imagery (Zou et al., 2015), in addition to the hierarchical convolutional features of hyperspectral imagery (Cheng et al., 2018a). For image classification in ImageNet (Krizhevsky et al., 2012), GoogLeNet (Szegedy et al., 2015) and CaffeNet (Jia et al., 2014) have been employed for scene classification (Castelluccio et al., 2015). Furthermore, a series of deep learning based scene classification approaches have been proposed (Cheng et al., 2020). For example, to replace the traditional handcrafted feature descriptor, (Cheng et al., 2017b) proposed a scene classification method based on the bag of convolutional features; (Cheng et al., 2018b) proposed the learning of a discriminative CNN to solve the problems of inter-class similarity and intra-class diversity in scene image classification; (Lu et al., 2019b) designed a scene image classification CNN through aggregating the end-to-end features; and Gong et al. (2018) put forward a diversity-enhanced metric learning approach for deep structures in HSR image scene classification. Other deep learning based approaches for scene image classification have also been developed by Anwer et al. (2018), Lu et al. (2019a), Han et al. (2018), and Zhu et al. (2019).

However, in order to design a satisfactory deep CNN that can extract different levels of image information for scene semantic classification, comprehensive domain knowledge of the recognition and interpretation of remote sensing image features and deep learning in computer vision is required (Wang et al., 2020), and strenuous effort must be devoted to the design of the relevant network structure by human experts. Thus, it is natural to consider whether a computer could be used to automatically search for and obtain a suitable data-driven network. Fortunately, thanks to the rapid expansion of graphics processing units (GPUs) and other hardware, computing power has been greatly improved. The Google automatic machine learning (AutoML) platform is a suite of machine learning products, in which the implementation mechanisms are collectively referred to as neural architecture search (NAS) (He et al., 2019, He et al., 2018). The principal idea of NAS can be summarized as three steps: 1) definition of the search space; 2) the search strategy, in which candidate network structures can be found through the search strategy and then evaluated; and 3) the performance estimation strategy, in which the next iteration is carried out according to the feedback (Elsken et al., 2018). In the field of natural image interpretation, NAS methods have outperformed manually designed architectures in natural image classification (Zoph and Le, 2017; Zoph et al., 2018, Liu et al., 2018) and semantic segmentation (Liu et al., 2019). Thus, it can be found that the structure of the network can be established through the characteristics of the dataset itself. Moreover, the search strategy plays an important role in NAS, and can be categorized into three types (Lu et al., 2019c; Xie et al., 2018): 1) gradient-based (GB) search; 2) reinforcement learning (RL) search; and 3) evolutionary algorithms (EAs). Real et al. (2019) conducted a case study of the different search strategies, and they found that the RL- and EA-based methods can obtain similar performances, and these two approaches both perform better than the GB-based methods. In addition, more computing resources are needed in the RL-based methods, and smaller models can be obtained by the EA-based methods. In fact, EA-based NAS methods have been a topic of interest for a period of time, and are also referred to as neuroevolution methods. Back in the 1990s, Yao and colleagues (Yao, 1993; Yao and Liu, 1997; Yao, 1999) suggested that neuroevolution is a different kind of deep learning. Since then, studies combining evolutionary methods with artificial neural networks (ANNs) have aroused the attention of scholars. In addition, neuroevolution has also crossed from shallow architecture search to deep network architecture search. Thus, EA-based NAS is further discussed in this article.

Learning from biological evolution and natural selection, the EA-based NAS methods attempt to explain the connection between the CNN structure and natural evolution (Xie and Yuille, 2017). Currently, with the rapid development of deep CNNs, EA-based architecture search using deep CNNs is attracting more and more attention in the artificial intelligence community (Real et al., 2019, Wang et al., 2019). EA-based architecture search involves attempting to evolve, design, and build the neural network through the EA, instead of stochastic gradient descent and artificial design. Moreover, EAs have also played a significant role in traditional machine learning for remote sensing image interpretation (Zhong et al., 2018), in applications such as remote sensing image clustering (Wan et al., 2019; Alok et al., 2016), subpixel mapping (Song et al., 2019), sparse unmixing (Gong et al., 2017), and change detection (Song et al., 2018). Thus, EAs represent a potential solution for the automatic search for a satisfactory deep CNN for scene classification. In addition, there is also a lack of networks automatically evolved and searched from remote sensing data in image interpretation, which could avoid the need for arduous manual tuning.

The objective functions optimized in the EA are also important. In addition to the use of the test accuracy to evaluate a neural network for image interpretation, the computational complexity of the network should also be considered for a comprehensive evaluation (Lu et al., 2019c; Tan et al., 2019). Thus, the computational complexity and the test error/accuracy are required to be balanced, so that a competitive solution set is obtained. Fortunately, the population-based EAs can provide an ideal solution to the multi-objective optimization problems (Coello, 2006), and multi-objective evolutionary optimization methods have been successfully employed in traditional machine learning methods for remote sensing image interpretation, in applications such as image clustering (Ma et al., 2015), subpixel mapping (Song et al., 2019, Ma et al., 2018), hyperspectral feature selection (Zhang et al., 2018), and hyperspectral image sparse unmixing (Xu and Shi, 2017). Thus, in a NAS-based scene classification network, multi-objective optimization of the test error and the computational complexity should be considered in the evolutionary NAS to provide a non-dominated choice in the competitive solution set.

In this article, in order to automatically search for a satisfactory network for scene classification from the image dataset itself, a framework of scene classification network architecture search based on multi-objective neural evolution is proposed. The main contributions of this article are summarized below:

1) A framework of scene classification network architecture search based on multi-objective neural evolution. The proposed SceneNet is an EA-based NAS approach for the remote sensing image scene classification task, which has not been achieved in the existing studies. In SceneNet, the most suitable network can be automatically searched from the dataset itself, without requiring handcrafted design, and the computational complexity and the performance error of the searched network can be balanced adaptively.

2) Evolutionary algorithm based flexible extraction of the scene information and the powerful search capability. In SceneNet, the connection modes of the network architecture are encoded through the form of binary coding in the EA, and a search space definition is established, which can provide a flexible test of the different connection modes between the convolutional layers, so that more flexible hierarchical extraction of the remote sensing image scene information can be achieved for a better classification result. Moreover, the powerful exploitation and exploration search capabilities of the network architecture can be attributed to the global search capability of the EA and the local search capability of the Bayesian optimization algorithm (BOA) (Pelikan et al., 1999).

3) The multi-objective trade-off for network design. For real-world deployment, the computational complexity should be simultaneously optimized with the accuracy, so the floating-point operations (FLOPs) are counted as the computational complexity. Thus, the accuracy and computational complexity are balanced in the proposed SceneNet method by utilizing multi-objective optimization. The non-dominated network individuals and a competitive Pareto optimal solution set are then obtained, providing the user with practical choices.

The rest of this paper is organized as follows. The related research background is presented in Section 2. The proposed SceneNet method is introduced in Section 3. Section 4 describes the experiments undertaken in this study. A discussion and our conclusions are provided in Section 5.

Section snippets

The classic CNN classification networks

With regard to CNN classification networks, a lot of successful examples have been used in scene classification, with the aim being to capture the global information of the remote sensing imagery. As shown in Fig. 1(a), AlexNet, which was designed by Krizhevsky et al. (2012), obtained first place in the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC 2012). There are eight layers in AlexNet, which can be divided into five convolutional layers and three fully connected layers.

Scene classification network architecture search based on multi-objective neural evolution (SceneNet)

In order to achieve automatic design of the network architecture for HSR remote sensing scene classification, a framework of nature-inspired multi-objective neural evolution is proposed. In Fig. 4, the overall flowchart of the designed SceneNet approach is presented, and a more detailed description is provided in the later sections.

Experiments and analyses

To prove the effectiveness of the designed approach—scene classification network architecture search based on multi-objective neural evolution (SceneNet)—several state-of-the-art networks designed by human experts were compared with the proposed SceneNet method, i.e., AlexNet (Krizhevsky et al., 2012), VGG16 (Simonyan and Zisserman, 2014), ResNet34 (He et al., 2016), and GoogLeNet (Szegedy et al., 2015). The UC Merced (UCM) land-use dataset (Yang and Newsam, 2010), the NWPU RESISC45 (Cheng et

Conclusion and discussion

In this article, in order to provide an evolutionary scene image classification network for remote sensing datasets, a nature-inspired multi-objective neural evolution method has been proposed (SceneNet). Furthermore, the search space is encoded as natural chromosomes, with a more flexible and diverse presentation. The results obtained in this study demonstrated that the proposed SceneNet algorithm can provide a competitive Pareto optimal solution set of scene classification networks for

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFB0504202, in part by the National Natural Science Foundation of China under Grant 41801267, 42071350, and 41771385, in part by the Fundamental Research Funds for the Central Universities grand NO. 2042020kf0014.

References (65)

  • A.K. Alok et al.

    Multi-objective semi-supervised clustering for automatic pixel classification from remote sensing imagery

    Soft Comput.

    (2016)
  • Castelluccio, M., Poggi, G., Sansone, C., Verdoliva, L., 2015. Land use classification in remote sensing images by...
  • Z. Chen et al.

    Lifelong machine learning

    Synthesis Lectures on Artificial Intelligence and Machine Learning

    (2018)
  • G. Cheng et al.

    Effective and efficient midlevel visual elements-oriented land-use classification using VHR remote sensing images

    IEEE Trans. Geosci. Remote Sens.

    (2015)
  • G. Cheng et al.

    Remote sensing image scene classification: Benchmark and state of the art

    Proceedings of the IEEE

    (2017)
  • G. Cheng et al.

    Exploring hierarchical convolutional features for hyperspectral image classification

    IEEE Transactions on Geoscience and Remote Sensing

    (2018)
  • G. Cheng et al.

    Remote sensing image scene classification using bag of convolutional features

    IEEE Geoscience and Remote Sensing Letters

    (2017)
  • Cheng, G., Xie, X., Han, J., Guo, L., Xia, G. S., 2020. Remote sensing image scene classification meets deep learning:...
  • G. Cheng et al.

    When deep learning meets metric learning: Remote sensing image scene classification via learning discriminative CNNs

    IEEE Trans. Geosci. Remote Sens.

    (2018)
  • C.C. Coello

    Evolutionary multi-objective optimization: a historical view of the field

    IEEE Comput. Intell. Mag.

    (2006)
  • M. Crepinšek et al.

    Exploration and exploitation in evolutionary algorithms: A survey

    ACM Comput. Surv. (CSUR)

    (2013)
  • Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C., 2004. Visual categorization with bags of keypoints. In: 2004...
  • K. De Jong

    Evolutionary computation: a unified approach

  • K. Deb et al.

    A fast and elitist multi-objective genetic algorithm: NSGA-II

    IEEE Trans. Evol. Comput.

    (2002)
  • T. Elsken et al.

    Neural architecture search: A survey

    [online] Available

    (2018)
  • M. Gong et al.

    A multiobjective cooperative coevolutionary algorithm for hyperspectral sparse unmixing

    IEEE Trans. Evol. Comput.

    (2017)
  • Z. Gong et al.

    Diversity-promoting deep structural metric learning for remote sensing scene classification

    IEEE Trans. Geosci. Remote Sens.

    (2018)
  • J. Hafner et al.

    Efficient color histogram indexing for quadratic form distance functions

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1995)
  • X. Han et al.

    Pre-trained AlexNet architecture with pyramid pooling and supervision for high spatial resolution remote sensing image scene classification

    Remote Sens.

    (2017)
  • R.M. Haralick et al.

    Textural features for image classification”

    IEEE Transactions on Systems, Man, and Cyber.

    (1973)
  • K. He et al.

    Deep residual learning for image recognition

  • He, X., Zhao, K., Chu, X., 2019. AutoML: A survey of the state-of-the-art. arXiv preprint...
  • Cited by (102)

    View all citing articles on Scopus
    View full text