SceneNet: Remote sensing scene classification deep learning network using multi-objective neural evolution architecture search

doi:10.1016/j.isprsjprs.2020.11.025

ISPRS Journal of Photogrammetry and Remote Sensing

Volume 172, February 2021, Pages 171-188

https://doi.org/10.1016/j.isprsjprs.2020.11.025 Get rights and content

Abstract

The scene classification approaches using deep learning have been the subject of much attention for remote sensing imagery. However, most deep learning networks have been constructed with a fixed architecture for natural image processing, and they are difficult to apply directly to remote sensing images, due to the more complex geometric structural features. Thus, there is an urgent need for automatic search for the most suitable neural network architecture from the image data in scene classification, in which a powerful search mechanism is required, and the computational complexity and performance error of the searched network should be balanced for a practical choice. In this article, a framework for scene classification network architecture search based on multi-objective neural evolution (SceneNet) is proposed. In SceneNet, the network architecture coding and searching are achieved using an evolutionary algorithm, which can implement a more flexible hierarchical extraction of the remote sensing image scene information. Moreover, the computational complexity and the performance error of the searched network are balanced by employing the multi-objective optimization method, and the competitive neural architectures are obtained in a Pareto solution set. The effectiveness of SceneNet is demonstrated by experimental comparisons with several deep neural networks designed by human experts.

Introduction

Scene classification refers to distinguishing the different semantic features of remote sensing images, which means that the different features and spatial distributions are reflected in the remote sensing images (Cheng et al., 2017a). Compared with image interpretation based on the pixel and object levels, scene level based image classification considers the different spatial distribution modes for the different objects in the high spatial resolution (HSR) remote sensing images (Xia et al., 2017, Zhu et al., 2018). There are various applications of scene classification, especially for land-use identification and urban planning (Cheng et al., 2015, Zhao et al., 2019). However, scene classification is still an arduous task due to the complex spatial and structural patterns of remote sensing images.

Scene classification has developed a lot in recent years. Compared with pixel-level classification and object detection, the semantic label information is emphasized in remote sensing scene classification, and can give more social semantic attributes (Zhao et al., 2016), such as airport, industrial area, commercial area, golf course, etc. Moreover, the scene classification methods can be classified into the three categories of low-level, middle-level, and high-level methods. The low-level features in scene classification include the color histogram (Hafner et al., 1995), local binary patterns (LBPs) (Ojala et al., 2002), and the gray-level co-occurrence matrix (GLCM) (Haralick et al., 1973). The middle-level methods, such as the bag-of-visual-words (BoVW) model, are an important way to extract the visual descriptors of the scenes (Zhu et al., 2016, Csurka et al., 2004). However, the high-level features are usually ignored in these handcrafted feature based traditional approaches.

Recently, deep learning based methods have played a vital role in extracting the high-level features, and have emerged as a dominant paradigm in pattern recognition and computer vision (Chen and Liu, 2018). In deep learning technology, convolutional neural networks (CNNs) can be regarded as the typical data-driven methods, and are powerful tools that can be used to discover the intricate structures and extract the essential information of HSR remote sensing imagery (Zou et al., 2015), in addition to the hierarchical convolutional features of hyperspectral imagery (Cheng et al., 2018a). For image classification in ImageNet (Krizhevsky et al., 2012), GoogLeNet (Szegedy et al., 2015) and CaffeNet (Jia et al., 2014) have been employed for scene classification (Castelluccio et al., 2015). Furthermore, a series of deep learning based scene classification approaches have been proposed (Cheng et al., 2020). For example, to replace the traditional handcrafted feature descriptor, (Cheng et al., 2017b) proposed a scene classification method based on the bag of convolutional features; (Cheng et al., 2018b) proposed the learning of a discriminative CNN to solve the problems of inter-class similarity and intra-class diversity in scene image classification; (Lu et al., 2019b) designed a scene image classification CNN through aggregating the end-to-end features; and Gong et al. (2018) put forward a diversity-enhanced metric learning approach for deep structures in HSR image scene classification. Other deep learning based approaches for scene image classification have also been developed by Anwer et al. (2018), Lu et al. (2019a), Han et al. (2018), and Zhu et al. (2019).

However, in order to design a satisfactory deep CNN that can extract different levels of image information for scene semantic classification, comprehensive domain knowledge of the recognition and interpretation of remote sensing image features and deep learning in computer vision is required (Wang et al., 2020), and strenuous effort must be devoted to the design of the relevant network structure by human experts. Thus, it is natural to consider whether a computer could be used to automatically search for and obtain a suitable data-driven network. Fortunately, thanks to the rapid expansion of graphics processing units (GPUs) and other hardware, computing power has been greatly improved. The Google automatic machine learning (AutoML) platform is a suite of machine learning products, in which the implementation mechanisms are collectively referred to as neural architecture search (NAS) (He et al., 2019, He et al., 2018). The principal idea of NAS can be summarized as three steps: 1) definition of the search space; 2) the search strategy, in which candidate network structures can be found through the search strategy and then evaluated; and 3) the performance estimation strategy, in which the next iteration is carried out according to the feedback (Elsken et al., 2018). In the field of natural image interpretation, NAS methods have outperformed manually designed architectures in natural image classification (Zoph and Le, 2017; Zoph et al., 2018, Liu et al., 2018) and semantic segmentation (Liu et al., 2019). Thus, it can be found that the structure of the network can be established through the characteristics of the dataset itself. Moreover, the search strategy plays an important role in NAS, and can be categorized into three types (Lu et al., 2019c; Xie et al., 2018): 1) gradient-based (GB) search; 2) reinforcement learning (RL) search; and 3) evolutionary algorithms (EAs). Real et al. (2019) conducted a case study of the different search strategies, and they found that the RL- and EA-based methods can obtain similar performances, and these two approaches both perform better than the GB-based methods. In addition, more computing resources are needed in the RL-based methods, and smaller models can be obtained by the EA-based methods. In fact, EA-based NAS methods have been a topic of interest for a period of time, and are also referred to as neuroevolution methods. Back in the 1990s, Yao and colleagues (Yao, 1993; Yao and Liu, 1997; Yao, 1999) suggested that neuroevolution is a different kind of deep learning. Since then, studies combining evolutionary methods with artificial neural networks (ANNs) have aroused the attention of scholars. In addition, neuroevolution has also crossed from shallow architecture search to deep network architecture search. Thus, EA-based NAS is further discussed in this article.

Learning from biological evolution and natural selection, the EA-based NAS methods attempt to explain the connection between the CNN structure and natural evolution (Xie and Yuille, 2017). Currently, with the rapid development of deep CNNs, EA-based architecture search using deep CNNs is attracting more and more attention in the artificial intelligence community (Real et al., 2019, Wang et al., 2019). EA-based architecture search involves attempting to evolve, design, and build the neural network through the EA, instead of stochastic gradient descent and artificial design. Moreover, EAs have also played a significant role in traditional machine learning for remote sensing image interpretation (Zhong et al., 2018), in applications such as remote sensing image clustering (Wan et al., 2019; Alok et al., 2016), subpixel mapping (Song et al., 2019), sparse unmixing (Gong et al., 2017), and change detection (Song et al., 2018). Thus, EAs represent a potential solution for the automatic search for a satisfactory deep CNN for scene classification. In addition, there is also a lack of networks automatically evolved and searched from remote sensing data in image interpretation, which could avoid the need for arduous manual tuning.

The objective functions optimized in the EA are also important. In addition to the use of the test accuracy to evaluate a neural network for image interpretation, the computational complexity of the network should also be considered for a comprehensive evaluation (Lu et al., 2019c; Tan et al., 2019). Thus, the computational complexity and the test error/accuracy are required to be balanced, so that a competitive solution set is obtained. Fortunately, the population-based EAs can provide an ideal solution to the multi-objective optimization problems (Coello, 2006), and multi-objective evolutionary optimization methods have been successfully employed in traditional machine learning methods for remote sensing image interpretation, in applications such as image clustering (Ma et al., 2015), subpixel mapping (Song et al., 2019, Ma et al., 2018), hyperspectral feature selection (Zhang et al., 2018), and hyperspectral image sparse unmixing (Xu and Shi, 2017). Thus, in a NAS-based scene classification network, multi-objective optimization of the test error and the computational complexity should be considered in the evolutionary NAS to provide a non-dominated choice in the competitive solution set.

In this article, in order to automatically search for a satisfactory network for scene classification from the image dataset itself, a framework of scene classification network architecture search based on multi-objective neural evolution is proposed. The main contributions of this article are summarized below:

1) A framework of scene classification network architecture search based on multi-objective neural evolution. The proposed SceneNet is an EA-based NAS approach for the remote sensing image scene classification task, which has not been achieved in the existing studies. In SceneNet, the most suitable network can be automatically searched from the dataset itself, without requiring handcrafted design, and the computational complexity and the performance error of the searched network can be balanced adaptively.

2) Evolutionary algorithm based flexible extraction of the scene information and the powerful search capability. In SceneNet, the connection modes of the network architecture are encoded through the form of binary coding in the EA, and a search space definition is established, which can provide a flexible test of the different connection modes between the convolutional layers, so that more flexible hierarchical extraction of the remote sensing image scene information can be achieved for a better classification result. Moreover, the powerful exploitation and exploration search capabilities of the network architecture can be attributed to the global search capability of the EA and the local search capability of the Bayesian optimization algorithm (BOA) (Pelikan et al., 1999).

3) The multi-objective trade-off for network design. For real-world deployment, the computational complexity should be simultaneously optimized with the accuracy, so the floating-point operations (FLOPs) are counted as the computational complexity. Thus, the accuracy and computational complexity are balanced in the proposed SceneNet method by utilizing multi-objective optimization. The non-dominated network individuals and a competitive Pareto optimal solution set are then obtained, providing the user with practical choices.

The rest of this paper is organized as follows. The related research background is presented in Section 2. The proposed SceneNet method is introduced in Section 3. Section 4 describes the experiments undertaken in this study. A discussion and our conclusions are provided in Section 5.

Section snippets

The classic CNN classification networks

With regard to CNN classification networks, a lot of successful examples have been used in scene classification, with the aim being to capture the global information of the remote sensing imagery. As shown in Fig. 1(a), AlexNet, which was designed by Krizhevsky et al. (2012), obtained first place in the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC 2012). There are eight layers in AlexNet, which can be divided into five convolutional layers and three fully connected layers.

Scene classification network architecture search based on multi-objective neural evolution (SceneNet)

In order to achieve automatic design of the network architecture for HSR remote sensing scene classification, a framework of nature-inspired multi-objective neural evolution is proposed. In Fig. 4, the overall flowchart of the designed SceneNet approach is presented, and a more detailed description is provided in the later sections.

Experiments and analyses

To prove the effectiveness of the designed approach—scene classification network architecture search based on multi-objective neural evolution (SceneNet)—several state-of-the-art networks designed by human experts were compared with the proposed SceneNet method, i.e., AlexNet (Krizhevsky et al., 2012), VGG16 (Simonyan and Zisserman, 2014), ResNet34 (He et al., 2016), and GoogLeNet (Szegedy et al., 2015). The UC Merced (UCM) land-use dataset (Yang and Newsam, 2010), the NWPU RESISC45 (Cheng et

Conclusion and discussion

In this article, in order to provide an evolutionary scene image classification network for remote sensing datasets, a nature-inspired multi-objective neural evolution method has been proposed (SceneNet). Furthermore, the search space is encoded as natural chromosomes, with a more flexible and diverse presentation. The results obtained in this study demonstrated that the proposed SceneNet algorithm can provide a competitive Pareto optimal solution set of scene classification networks for

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFB0504202, in part by the National Natural Science Foundation of China under Grant 41801267, 42071350, and 41771385, in part by the Fundamental Research Funds for the Central Universities grand NO. 2042020kf0014.

References (65)

R.M. Anwer et al.
Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification
ISPRS J. Photogramm. Remote Sens.
(2018)
W. Han et al.
A semi-supervised generative framework with deep learning features for high-resolution remote sensing image scene classification
ISPRS J. Photogramm. Remote Sens.
(2018)
K. Nogueira et al.
Towards better exploiting convolutional neural networks for remote sensing scene classification
Patt. Recogn.
(2017)
X. Xu et al.
Multi-objective based spectral unmixing for hyperspectral images
ISPRS J. Photogramm. Remote Sens.
(2017)
M. Zhang et al.
Hyperspectral band selection based on multi-objective optimization with high information and low redundancy
Appl. Soft Comput.
(2018)
B. Zhao et al.
A spectral–structural bag-of-features scene classifier for very high spatial resolution remote sensing imagery
ISPRS J. Photogramm. Remote Sens.
(2016)
W. Zhao et al.
Exploring semantic elements for urban scene recognition: Deep integration of high-resolution imagery and OpenStreetMap (OSM)
ISPRS J. Photogramm. Remote Sens.
(2019)
Y. Zhong et al.
Computational intelligence in optical remote sensing image processing
Appl. Soft Comput.
(2018)
R. Zhu et al.
Semi-supervised center-based discriminative adversarial learning for cross-domain scene-level land-cover classification of aerial images
ISPRS J. Photogramm. Remote Sens.
(2019)
A. Alcolea et al.
Inference in supervised spectral classifiers for on-board hyperspectral imaging: An overview
Remote Sens.
(2020)

A.K. Alok et al.

Multi-objective semi-supervised clustering for automatic pixel classification from remote sensing imagery

Soft Comput.

(2016)

Castelluccio, M., Poggi, G., Sansone, C., Verdoliva, L., 2015. Land use classification in remote sensing images by...

Z. Chen et al.

Lifelong machine learning

Synthesis Lectures on Artificial Intelligence and Machine Learning

(2018)

G. Cheng et al.

Effective and efficient midlevel visual elements-oriented land-use classification using VHR remote sensing images

IEEE Trans. Geosci. Remote Sens.

(2015)

G. Cheng et al.

Remote sensing image scene classification: Benchmark and state of the art

Proceedings of the IEEE

(2017)

G. Cheng et al.

Exploring hierarchical convolutional features for hyperspectral image classification

IEEE Transactions on Geoscience and Remote Sensing

(2018)

G. Cheng et al.

Remote sensing image scene classification using bag of convolutional features

IEEE Geoscience and Remote Sensing Letters

(2017)

Cheng, G., Xie, X., Han, J., Guo, L., Xia, G. S., 2020. Remote sensing image scene classification meets deep learning:...

G. Cheng et al.

When deep learning meets metric learning: Remote sensing image scene classification via learning discriminative CNNs

IEEE Trans. Geosci. Remote Sens.

(2018)

C.C. Coello

Evolutionary multi-objective optimization: a historical view of the field

IEEE Comput. Intell. Mag.

(2006)

M. Crepinšek et al.

Exploration and exploitation in evolutionary algorithms: A survey

ACM Comput. Surv. (CSUR)

(2013)

Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C., 2004. Visual categorization with bags of keypoints. In: 2004...

K. De Jong

Evolutionary computation: a unified approach

K. Deb et al.

A fast and elitist multi-objective genetic algorithm: NSGA-II

IEEE Trans. Evol. Comput.

(2002)

T. Elsken et al.

Neural architecture search: A survey

[online] Available

(2018)

M. Gong et al.

A multiobjective cooperative coevolutionary algorithm for hyperspectral sparse unmixing

IEEE Trans. Evol. Comput.

(2017)

Z. Gong et al.

Diversity-promoting deep structural metric learning for remote sensing scene classification

IEEE Trans. Geosci. Remote Sens.

(2018)

J. Hafner et al.

Efficient color histogram indexing for quadratic form distance functions

IEEE Trans. Pattern Anal. Mach. Intell.

(1995)

X. Han et al.

Pre-trained AlexNet architecture with pyramid pooling and supervision for high spatial resolution remote sensing image scene classification

Remote Sens.

(2017)

R.M. Haralick et al.

Textural features for image classification”

IEEE Transactions on Systems, Man, and Cyber.

(1973)

K. He et al.

Deep residual learning for image recognition

He, X., Zhao, K., Chu, X., 2019. AutoML: A survey of the state-of-the-art. arXiv preprint...

Cited by (102)

Classification of inland lake water quality levels based on Sentinel-2 images using convolutional neural networks and spatiotemporal variation and driving factors of algal bloom
2024, Ecological Informatics
Water quality monitoring in inland lakes is crucial to ensuring the health and stability of aquatic ecosystems. For regional water environment agencies and researchers, remote sensing offers a cost-effective alternative to traditional in-situ water sampling methods. In this study, we designed a convolutional neural network (CNN) based on AlexNet to represent the relationship between Sentinel-2 images and in situ water quality levels of Lake Dianchi from November 2020 to April 2023. The model incorporated an algal bloom extraction algorithm and utilized correlation analysis, redundancy analysis (RDA), and random forest (RF) method to establish connections between two environmental factors: water quality and meteorology, to the area of algal bloom (AAB). The findings revealed an improvement in Lake Dianchi's water quality, with Levels A (good water quality) and B (mildly polluted water quality) averaging 1.24% and 84.28%, respectively. Starting in October 2022, water quality stabilized at Level B, averaging at 98.17%. Seasonal variations demonstrated the best water quality in spring and the worst in summer (Level C, severely polluted water quality, accounting for 5.19% and 21.68%, respectively). Algal bloom presence was minimally observed, with an average AAB value of 1.75%, peaking in autumn (4.05%) and hitting a low in winter (0.38%). A significant correlation was identified between water quality levels and AAB, with a notable spatial trend of decreasing Level C water quality and AAB from north to south, featuring lower AAB in the Southern Waihai compared to the Central Waihai. Statistical analysis pinpointed total phosphorus (TP) as the dominant factor influencing AAB, while meteorological factors such as wind speed (WS), relative humidity (RH), and precipitation (PP) playing secondary roles. Despite fluctuations in TP concentration, a recent stabilization at 0.05 mg/L suggests a positive trajectory for future algal bloom management in Lake Dianchi.
LoveNAS: Towards multi-scene land-cover mapping via hierarchical searching adaptive network
2024, ISPRS Journal of Photogrammetry and Remote Sensing
Land-cover information reflects basic Earth’s surface environments and is critical to human settlements. As a well-established deep learning architecture, the fully convolutional network has achieved impressive progress in various land-cover mapping tasks. However, most research has focused on designing powerful encoders, ignoring the exploration of decoders. The existing handcrafted decoders are relatively simple and lack flexibility, limiting the generalizability for complex remote sensing scenes. In this paper, we propose a Land-cOVEr mapping Neural Architecture Search framework (LoveNAS) to automatically find efficient decoders that are compatible with the encoders and tasks. Specifically, LoveNAS introduces a hierarchical dense search space, including densely connected layer-level and multi-scale operation-level search spaces. The search spaces contain independent connection and operation fusion strategies, facilitating sufficient interaction of multi-scale features. After searching based on large-scale datasets, a series of pre-trained encoders and adaptive decoders are obtained. These can be smoothly applied to multi-scene tasks using weight-transfer network training. Experimental results on normal and disaster scenes shows that LoveNAS outperforms 16 handcrafted architectures and NAS methods. Some searched structures coincide with the existing advanced artificial designs, revealing the potential value of LoveNAS in network design and guidance. Group’s website: http://rsidea.whu.edu.cn/resource_sharing.htm. GitHub page: https://github.com/Junjue-Wang/LoveNAS.
STEFF: Spatio-temporal EfficientNet for dynamic texture classification in outdoor scenes
2024, Heliyon
In recent years, dynamic texture classification has become an important task for computer vision. This is a challenging task due to the unknown spatial and temporal nature of dynamic texture. To overcome this challenge, we investigate the potential of deep learning approaches and propose a novel spatio-temporal approach (STEFF) for dynamic texture classification that combines the representation power of motion and appearance using the difference and average operators between video sequences. In this work, we extract deep texture features from outdoor scenes and integrate both spatial and temporal features into a pre-trained Convolutional Neural Network model, namely EfficientNet, with a fine-tuning and regularization process. The robustness of the proposed approach is reflected in the promising result when comparing our method to the proposed architectures and other existing models. The experimental results on three datasets demonstrate the effectiveness and efficiency of the proposed approach. The accuracy percentages are 95.95%, 94.09%, and 98.01% on the outdoor scenes of Yupenn, DynTex++, and Yupenn++ datasets, respectively.
Slice-to-slice context transfer and uncertain region calibration network for shadow detection in remote sensing imagery
2023, ISPRS Journal of Photogrammetry and Remote Sensing
Although current methods based on deep learning (DL) have been widely employed in shadow detection tasks, the cluttered background and complex shadow features in remote sensing images (RSIs) make shadow detection still a challenging task. In this paper, we apply a neural network combined with a distance transformation algorithm to RSI shadow detection for the first time and propose a novel slice-to-slice context transfer and uncertainty region calibration network (SCUCNet). First, an interslice context semantic transfer (SCT) module is proposed to explore the connections among spatial pixel information and extract sufficient global contextual information via interslice feature transfer. This process helps alleviate the interference of multiscale differences and complex backgrounds on the detection results. Second, to further optimize the initial output mapping, calibrate the uncertainty region, and enhance the feature representation, we propose a plug-and-play uncertainty region awareness and calibration (URAC) module combined with a distance transform algorithm, which introduces only 0.09 M parameters. The quantitative results show that the proposed model outperforms state-of-the-art methods for shadow detection in RSIs, achieving an Intersection over Union (IoU) of 88.33% on the aerial imagery dataset for shadow detection (AISD). In addition, extensive experiments and visualizations further prove the validity and interpretability of our method, and the proposed method can effectively mitigate the interference caused by multiscale differences, morphological diversity, spectral similarity and heterogeneity.
Style and content separation network for remote sensing image cross-scene generalization
2023, ISPRS Journal of Photogrammetry and Remote Sensing
Domain shift is the problem in which trained models fail to maintain their performance when they confront new test domains. Cross-scene classification, a technique developed to overcome this challenge, has attracted significant research interest in the field of remote sensing (RS). By exploring the correlation between the source domain and the target domain, relevant models can have better generalization ability regarding the target domain. Nevertheless, Domain Adaptation (DA), as the main technique for cross-scene classification, necessarily requires access to the target samples to assist the model training, a condition that is difficult to satisfy in real-world applications. Domain Generalization (DG) has attracted increasing research attention in recent years. Given one or several source domain(s), DG tends to learn models that can perform well when dealing with unseen (inaccessible) target domains. DG can better deal with out-of-distribution generalization with fewer restrictions, making it a good fit for cross-scene classification. Notably, little research has been conducted on domain generalization in the field of remote sensing. Thus, we refer to this type of method as cross-scene generalization. Recent studies have shown that convolutional neural networks have a strong bias towards recognizing textures rather than shapes. Accordingly, we proposed a Style and Content Separation Network (SCSN) for RS image cross-scene generalization in this paper, which can improve the generalizability and discriminative capability. The Style and Content Separation (SCS) module uses instance normalization to obtain the content information and thereby ensure better generalization ability. Moreover, the residual feature, which contains the style information, can supplement the feature representations after refinement. We further proposed a separation loss to constrain the style and content separation process. Experimental results and relevant analysis demonstrate the effectiveness of the proposed SCSN on cross-scene generalization tasks. Code is available at https://github.com/WHUzhusihan96/SCSN.
Large-scale agricultural greenhouse extraction for remote sensing imagery based on layout attention network: A case study of China
2023, ISPRS Journal of Photogrammetry and Remote Sensing
Rapid and accurate agricultural greenhouse extraction with remote sensing imagery is essential for providing spatial information for precision agriculture. Benefiting from local spatial perception, deep learning based object extraction methods have achieved satisfactory performances in extracting geo-objects. However, they fail to work for large-scale greenhouse extraction since the objects are sparsely distributed in the background and densely distributed in the foreground, where the local spatial perception causes redundant computation and false detection problems. In this paper, we propose a layout attention network (LANet) framework for large-scale greenhouse extraction using remote sensing imagery, which replaces the local spatial perception with spatial layout perception, i.e., a sparse global layout to identify the sparse background and a dense local layout to identify the dense foreground. To address the shortcoming of the sparse background, which leads to redundant computation, a sparse global layout awareness module is formulated as a scene classifier. This accommodates the global layout attention map of the global scene features by adopting a layout-shared convolutional neural network (CNN) backbone for generating class-agnostic layout priors and global channel attention for aggregating discriminative global layout features, ensuring robust sparse background identification. Then, to alleviate the problem of the dense foreground, which causes false detection, a dense local layout awareness module is proposed to incorporate the local layout attention map and rotated region of interest (RRoI) features. The RRoI features are then further embedded to guide the initial RRoIs for object location refinement by aligning the initial RRoI locations in a layout-sensitive attention mechanism and achieving semantic enhancement by taking the local layout density as a semantic prior to assign a reliable class score map. The experimental results obtained on an agricultural greenhouse benchmark dataset and a large-scale agricultural greenhouse extraction dataset illustrate that the proposed framework can outperform the state-of-the-art object extraction methods in both speed and accuracy, and has a high generalization ability for large-scale dense object extraction.

View all citing articles on Scopus

View full text

SceneNet: Remote sensing scene classification deep learning network using multi-objective neural evolution architecture search

Abstract

Introduction

Section snippets

The classic CNN classification networks

Scene classification network architecture search based on multi-objective neural evolution (SceneNet)

Experiments and analyses

Conclusion and discussion

Declaration of Competing Interest

Acknowledgement

ISPRS J. Photogramm. Remote Sens.

ISPRS J. Photogramm. Remote Sens.

Patt. Recogn.

ISPRS J. Photogramm. Remote Sens.

Appl. Soft Comput.

ISPRS J. Photogramm. Remote Sens.

ISPRS J. Photogramm. Remote Sens.

Appl. Soft Comput.

ISPRS J. Photogramm. Remote Sens.

Inference in supervised spectral classifiers for on-board hyperspectral imaging: An overview

Remote Sens.

Multi-objective semi-supervised clustering for automatic pixel classification from remote sensing imagery

Soft Comput.

Lifelong machine learning

Synthesis Lectures on Artificial Intelligence and Machine Learning

Effective and efficient midlevel visual elements-oriented land-use classification using VHR remote sensing images

IEEE Trans. Geosci. Remote Sens.

Remote sensing image scene classification: Benchmark and state of the art

Proceedings of the IEEE

Exploring hierarchical convolutional features for hyperspectral image classification

IEEE Transactions on Geoscience and Remote Sensing

Remote sensing image scene classification using bag of convolutional features

IEEE Geoscience and Remote Sensing Letters

When deep learning meets metric learning: Remote sensing image scene classification via learning discriminative CNNs

IEEE Trans. Geosci. Remote Sens.

Evolutionary multi-objective optimization: a historical view of the field

IEEE Comput. Intell. Mag.

Exploration and exploitation in evolutionary algorithms: A survey

ACM Comput. Surv. (CSUR)

Evolutionary computation: a unified approach

A fast and elitist multi-objective genetic algorithm: NSGA-II

IEEE Trans. Evol. Comput.

Neural architecture search: A survey

[online] Available

A multiobjective cooperative coevolutionary algorithm for hyperspectral sparse unmixing

IEEE Trans. Evol. Comput.

Diversity-promoting deep structural metric learning for remote sensing scene classification

IEEE Trans. Geosci. Remote Sens.

Efficient color histogram indexing for quadratic form distance functions

IEEE Trans. Pattern Anal. Mach. Intell.

Pre-trained AlexNet architecture with pyramid pooling and supervision for high spatial resolution remote sensing image scene classification

Remote Sens.

Textural features for image classification”

IEEE Transactions on Systems, Man, and Cyber.

Deep residual learning for image recognition