A novel convolutional neural network architecture of multispectral remote sensing images for automatic material classification

https://doi.org/10.1016/j.image.2021.116329Get rights and content

Highlights

  • A convolutional neural network (CNN) architecture to classify material in multispectral remote sensing images to simplify future models’ construction.

  • The RUNet model of multiple convolutional neural network architectures for material classification.

  • The RUNet model is based on an improved U-Net architecture combined with the shortcut connections approach.

  • The encoding layer includes 10 convolution layers and 4 pooling layers.

  • The decoding layer has 4 upsampling layers, 8 convolution layers, and one classified convolution layer.

Abstract

For real-world simulation, terrain models must combine various types of information on material and texture in terrain reconstruction for the three-dimensional numerical simulation of terrain. However, the construction of such models using the conventional method often involves high costs in both manpower and time. Therefore, this study used a convolutional neural network (CNN) architecture to classify material in multispectral remote sensing images to simplify the construction of future models. Visible light (i.e., RGB), near infrared (NIR), normalized difference vegetation index (NDVI), and digital surface model (DSM) images were examined.

This paper proposes the use of the robust U-Net (RUNet) model, which integrates multiple CNN architectures, for material classification. This model, which is based on an improved U-Net architecture combined with the shortcut connections in the ResNet model, preserves the features of shallow network extraction. The architecture is divided into an encoding layer and a decoding layer. The encoding layer comprises 10 convolutional layers and 4 pooling layers. The decoding layer contains four upsampling layers, eight convolutional layers, and one classification convolutional layer. The material classification process in this study involved the training and testing of the RUNet model. Because of the large size of remote sensing images, the training process randomly cuts subimages of the same size from the training set and then inputs them into the RUNet model for training. To consider the spatial information of the material, the test process cuts multiple test subimages from the test set through mirror padding and overlapping cropping; RUNet then classifies the subimages. Finally, it merges the subimage classification results back into the original test image.

The aerial image labeling dataset of the National Institute for Research in Digital Science and Technology (Inria, abbreviated from the French Institut national de recherche en sciences et technologies du numérique) was used as well as its configured dataset (called Inria-2) and a dataset from the International Society for Photogrammetry and Remote Sensing (ISPRS). Material classification was performed with RUNet. Moreover, the effects of the mirror padding and overlapping cropping were analyzed, as were the impacts of subimage size on classification performance. The Inria dataset achieved the optimal results; after the morphological optimization of RUNet, the overall intersection over union (IoU) and classification accuracy reached 70.82% and 95.66%, respectively. Regarding the Inria-2 dataset, the IoU and accuracy were 75.5% and 95.71%, respectively, after classification refinement. Although the overall IoU and accuracy were 0.46% and 0.04% lower than those of the improved fully convolutional network, the training time of the RUNet model was approximately 10.6 h shorter. In the ISPRS dataset experiment, the overall accuracy of the combined multispectral, NDVI, and DSM images reached 89.71%, surpassing that of the RGB images. NIR and DSM provide more information on material features, reducing the likelihood of misclassification caused by similar features (e.g., in color, shape, or texture) in RGB images. Overall, RUNet outperformed the other models in the material classification of remote sensing images. The present findings indicate that it has potential for application in land use monitoring and disaster assessment as well as in model construction for simulation systems.

Introduction

Simulation systems, which realistically reproduce and visualize real environments, are used in military training, aerospace technology, urban planning, and disaster response. With regard to the complete operating interface of the simulation system, the environment displayed must be consistent with the actual scene. Variations in location, altitude, viewing angle, day and night lighting, and weather explain the time- and effort-intensive process of constructing three-dimensional terrain images, a key issue in the reconstruction of terrain and topography, which requires the development of a terrain model from images generated by a visual system. Although this is a virtual construct, the content is real and the physical environment is structured on the Internet.

Realism is essential in simulation. Consider the construction of the model of a house. Using textural information from the roof and wall surfaces as well as fitting the textured images to the three-dimensional surface of the house to achieve a realistic effect is worthy of exploration. The purpose of nesting is to obtain more information with higher specificity, as is integrating resources from different sources. The combination of two-dimensional (2D) with 2D and 2D with three-dimensional (3D) can be roughly classified as image to image [1], [2], image to topographic map, image to model, image to light detection and ranging [3], and the combination of topographic maps and light detection and ranging.

Image fitting, a technique commonly used in simulation, involves high construction and labor costs because differences in environmental conditions, such as position, height, viewing angle, perspective, day and night lighting, and weather, correspond to variations in the spatial coordinates, texture images, size, and color of the terrain model. One challenge is the automatic generation of night vision images by the terrain model at night instead of manually building a set of terrain models. The use of high-spatial-resolution image technology to classify terrain material (i.e., its characteristics, composition, and structure) combined with the spectral information of each point can provide ground-level images with more detailed information.

In general, optical images have 4 to 50 bands and a spectral resolution of 100 nm. Hyperspectral images can have 100 to 250 bands, with a spectral resolution of 3 nm, (5 nm in general). Higher spectral resolutions facilitate feature identification, but data on this type of image are variegated. Spectral responses vary by substance and wavelength, forming a so-called spectral signature. The differences in such spectral patterns can be used as the basis for classifying features. Therefore, substances that are difficult to distinguish in general multispectral images or mineral components with special spectral characteristics can also be detected in hyperspectral images. Although hyperspectral images have multiple waveband numbers, they also comprise substantial amounts of data. Coupled with the complexity of the equipment and the system, this means that such images are expensive to obtain. Therefore, the present study used visible light (i.e., general RGB) and multispectral images to classify the pixel material. Visible light images have a spectral range of approximately 450 to 700 nm, and multispectral images contain blue, green, and red visible light as well as near-infrared (NIR) light. They have four bands, with a spectral range of approximately 450 to 900 nm.

With the gradual maturation of computer vision technology in recent years, image recognition, classification, and pixel classification methods have also seen tremendous breakthroughs. The emergence of deep learning has greatly reduced the cost of artificial feature extraction; models can effectively identify images. Apart from classifying telemetry data with convolutional neural network (CNN) models, numerous studies have applied the concept of CNNs to the pixel classification of visible light and multispectral telemetry images with high spatial resolution, enabling the classification of image features such as color, texture, and shape on a pixel-wise basis. This is followed by the generation and subsequent classification of thermal (night vision) images by using the thermal energy emitted by the studied material (Fig. 1).

Although hyperspectral images contain more spectral bands, their spectral resolution represents the reflection characteristic of the material spectrum. These reflections can be used to identify various features. The shape of the spectral curve varies depending on the local materials, and features and spectral radiation responses differ according to the wavelength. Hyperspectral image analysis is the optimal approach to feature identification. However, as mentioned, hyperspectral images are expensive and difficult to obtain, and the data must be frequently updated. Therefore, a novel CNN architecture for automatic material classification based on multispectral telemetry images is presented. The multispectral wavelength is between 400 and 1040 nm (8 channels), whereas that of the full band is between 450 and 800 nm (3 channels).

The literature on model construction addresses model generation, sources of resources, and mapping methods. Studies on model generation involve the artificial selection of image feature points [4], the artificial selection of line segments with speculative features [5], and the extraction boundaries of terrain models [6]. Studies on resource sources have examined ground-based images [7], aerial images [8], and combined images [9]. As for mapping methods, investigations have centered on applying small patches to flat and cylindrical surfaces [10], directly pasting real ground images on house walls [11], extracting texture images in perspective [12], pasting texture images on walls with a load [12], and pasting real images on a plane by using a perspective method [13].

Images contain numerous types of substances, the identification and classification of which is based on characteristics such as color, texture, space, and shape. Image substances can be subjected to supervised or unsupervised classification. The supervised method uses training samples (i.e., data with known classes) to classify each input. [13]. Unsupervised classification does not consider training sample data [14].

The literature contains studies on unsupervised classification methods, including K-means clustering [15], simple linear iterative clustering [16], the iterative self-organizing data analysis technique [17], and fuzzy C-means clustering [18]. In recent years, some researchers have proposed the use of a CNN to build an unsupervised deep learning model [19]. Because supervised classification considers the categorical information of the training samples, it yields more accurate results than unsupervised classification. Furthermore, semi-supervised techniques that are based on labeled training samples and unlabeled samples have also been introduced [20].

Supervised learning methods include support vector machines, neural networks, and decision trees. With advances in deep learning, CNN models have had widespread use in various applications such as computer vision and natural language processing [21], with demonstrated effectiveness and feasibility. Along with image classification, CNNs have considerable use in semantic segmentation in the field of computer vision. Semantic segmentation is the classification of pixels in images. Numerous CNN-based material classification models have been proposed, including fully convolutional networks (FCNs) [22] for material classification; SegNet [23], a deep convolutional encoder–decoder architecture, for classifying streetscape images; U-Net [24] for classifying medical images; the multibias model (MBM) [25], which incorporates multibias into the rectified linear unit (ReLU) layer, for classifying hyperspectral images; the region-enhanced CNN (ReCNN) model [26], which detects objects in remote sensing images; and the deep spatial–spectral subspace clustering network (DS3C-Net) [27].

A study on telemetry images combined geometric edge feature extraction with multiscale road detection [28]; moreover, high-resolution visible light images were combined with Gabor, morphological filtering, and simple linear iterative aggregation. In another study, class- and region-adjacent graphs were used, and a hierarchical graph–based method for classifying roads was proposed [29]. Furthermore, models for CNN-based image and material classification have been developed: a dual-flow architecture for textural feature extraction and classification [30], a model for the classification of roads and buildings [31], and a comprehensive approach for scene and material classification [32].

Although multispectral telemetry images are relatively inexpensive and easy to obtain and can be subjected to material classification, the following challenges remain:

1. Presenting material characteristics on the surface of multispectral telemetry images is difficult.

2. Conventional methods are not suitable for the classification of multispectral data.

3. The relevance of multichannel data cannot be easily established from multispectral images.

4. Training telemetry images is difficult because of their large size.

5. For materials with numerous image changes, obtaining complete images during training is difficult.

6. The amount of material in the telemetry image affects the classification results.

7. Uncertainty is easily produced in material feature extraction and classification, negatively affecting classification accuracy. When examining hyperspectral images, uncertainty can stem from atmospheric conditions, radiation, and spatial resolution.

8. When processing optical data with low-to-moderate spatial recognition rates, mixed pixels between coverage classes affect classification and recognition.

As mentioned, simulation systems use terrain models, which are costly to build, to realistically present the environment. To ensure that the same set of terrain models covers terrain images captured under varying conditions (e.g., daylight and night), information on object materials must be incorporated into the terrain model. Material identification is relatively labor and time intensive. Identifying the materials in images beforehand can save substantial model-building costs. The present study used deep learning to classify pixels in visible light and multispectral telemetry images, and a material classification model involving a multi-CNN architecture was proposed. Material classification procedures are mainly classified into model training and model testing. During the training process, the model derives weights suitable for material classification. In the testing process, the image classification results can be obtained through the material classification model.

The novel CNN architecture for material classification presented in this study has multiple contributions. First, after material classification is performed, thermal images can be converted. Second, this architecture can replace hyperspectral images, which, as mentioned, are expensive and difficult to obtain and must be frequently updated. Third, the costs of terrain model construction are reduced. Fourth, regarding material classification, users can rapidly obtain information on topographical changes. Finally, the classification accuracy of this architecture surpasses that of conventional and similar neural network architectures. In the future, multi-size images and street view images will be combined to enhance the accuracy of material classification.

Section snippets

Related work

The material classification model in the present study involves semantic segmentation, in which the image pixels are classified. This is similar to the CNN used for the classification of the entire image, except for the fact that the material classification model does not have a fully connected layer. The CNN process begins with convolution and pooling, which is followed by the connection of the image to the features. Subsequently, independent analysis is conducted. CNNs have three layers. The

Proposed method

In this study, a novel CNN-based model was used to classify substances in telemetry images. The processing flow is divided into the training and testing of the material classification model. Robust U-Net (RUNet), a novel type of neural network model architecture, is proposed for material classification as follows. This model is based on the improved U-Net architecture, combined with the shortcut connections in ResNet. During training, the training image is preprocessed. Specifically, multiple

Experiment

Descriptions of the experimental environment and the image datasets are presented as follows. Next, the experimental evaluation criteria are examined, and the effectiveness of the experiment is discussed.

Conclusions

In this study, RUNet, a novel material classification model that combines multiple CNN architectures was proposed. RUNet classifies pixels in multispectral telemetry images and has potential in reducing the costs (e.g., in manpower) involved in terrain model construction. The material classification process is divided into model training and testing, with images preprocessed through random cropping during training. Testing involves image preprocessing through mirror compensation and overlapping

CRediT authorship contribution statement

Chuen-Horng Lin: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data, Writing - original draft, Writing - review & editing. Ting-You Wang: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by Ministry of Science and Technology, Taiwan , under Grant No. MOST 109-2221-E-025-010. Approval of the version of the manuscript to be published.

References (47)

  • HabibA. et al.

    Co-registration of photogrammetric and LIDAR data: Methodology and case study

    Rev. Bras. Cartogr.

    (2004)
  • RauJ.Y. et al.

    Robust reconstruction of building models from three-dimensional line segments

    Photogramm. Eng. Remote Sens.

    (2003)
  • ChenL.C. et al.

    Building reconstruction from LIDAR data and aerial imagery

  • TsaiF. et al.

    Polygon-based texture mapping for cyber city 3D building models

    Int. J. Geogr. Inf. Sci.

    (2007)
  • WuJ. et al.

    Automatic retrieval of optimal texture from aerial video for photo-realistic 3D visualization of street environment

  • FrühC. et al.

    Constructing 3D City Models by Merging Aerial and Ground Views

    IEEE Comput. Graph. Appl.

    (2003)
  • E. Catmull, Computer display ofcurved surfaces, in: Proceedings ofthe IEEE conference on Computer Graphics, Pattern...
  • ChonJ. et al.

    Urban visualization through video mosaics based on 3-D multibaselines

    Int. Arch. Photogramm. Remote Sens.

    (2004)
  • FruehC. et al.

    Data processing algorithms for generating textured 3D building facade meshes from laser scans and camera images

    Int. J. Comput. Vis.

    (2005)
  • GhamisiP. et al.

    Feature selection based on hybridization of genetic algorithm and particle swarm optimization

    IEEE Geosci. Remote Sens. Lett.

    (2015)
  • J. MacQueen, Some methods for classification and analysis of multivariate observations, in: Proceedings of the fifth...
  • AchantaR. et al.

    SLIC superpixels compared to state-of-the-art superpixel methods

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • BallG.H. et al.

    ISODATA, a Novel Method of Data Analysis and ClassificationTech. Rep.

    (1965)
  • Cited by (11)

    • Generic multispectral demosaicking using spectral correlation between spectral bands and pseudo-panchromatic image

      2023, Signal Processing: Image Communication
      Citation Excerpt :

      Multispectral images have proved their utility in many image processing, and computer vision related applications [1–12].

    View all citing articles on Scopus
    View full text