A novel convolutional neural network architecture of multispectral remote sensing images for automatic material classification

doi:10.1016/j.image.2021.116329

Signal Processing: Image Communication

Volume 97, September 2021, 116329

https://doi.org/10.1016/j.image.2021.116329 Get rights and content

Highlights

•
A convolutional neural network (CNN) architecture to classify material in multispectral remote sensing images to simplify future models’ construction.
•
The RUNet model of multiple convolutional neural network architectures for material classification.
•
The RUNet model is based on an improved U-Net architecture combined with the shortcut connections approach.
•
The encoding layer includes 10 convolution layers and 4 pooling layers.
•
The decoding layer has 4 upsampling layers, 8 convolution layers, and one classified convolution layer.

Abstract

For real-world simulation, terrain models must combine various types of information on material and texture in terrain reconstruction for the three-dimensional numerical simulation of terrain. However, the construction of such models using the conventional method often involves high costs in both manpower and time. Therefore, this study used a convolutional neural network (CNN) architecture to classify material in multispectral remote sensing images to simplify the construction of future models. Visible light (i.e., RGB), near infrared (NIR), normalized difference vegetation index (NDVI), and digital surface model (DSM) images were examined.

This paper proposes the use of the robust U-Net (RUNet) model, which integrates multiple CNN architectures, for material classification. This model, which is based on an improved U-Net architecture combined with the shortcut connections in the ResNet model, preserves the features of shallow network extraction. The architecture is divided into an encoding layer and a decoding layer. The encoding layer comprises 10 convolutional layers and 4 pooling layers. The decoding layer contains four upsampling layers, eight convolutional layers, and one classification convolutional layer. The material classification process in this study involved the training and testing of the RUNet model. Because of the large size of remote sensing images, the training process randomly cuts subimages of the same size from the training set and then inputs them into the RUNet model for training. To consider the spatial information of the material, the test process cuts multiple test subimages from the test set through mirror padding and overlapping cropping; RUNet then classifies the subimages. Finally, it merges the subimage classification results back into the original test image.

The aerial image labeling dataset of the National Institute for Research in Digital Science and Technology (Inria, abbreviated from the French Institut national de recherche en sciences et technologies du numérique) was used as well as its configured dataset (called Inria-2) and a dataset from the International Society for Photogrammetry and Remote Sensing (ISPRS). Material classification was performed with RUNet. Moreover, the effects of the mirror padding and overlapping cropping were analyzed, as were the impacts of subimage size on classification performance. The Inria dataset achieved the optimal results; after the morphological optimization of RUNet, the overall intersection over union (IoU) and classification accuracy reached 70.82% and 95.66%, respectively. Regarding the Inria-2 dataset, the IoU and accuracy were 75.5% and 95.71%, respectively, after classification refinement. Although the overall IoU and accuracy were 0.46% and 0.04% lower than those of the improved fully convolutional network, the training time of the RUNet model was approximately 10.6 h shorter. In the ISPRS dataset experiment, the overall accuracy of the combined multispectral, NDVI, and DSM images reached 89.71%, surpassing that of the RGB images. NIR and DSM provide more information on material features, reducing the likelihood of misclassification caused by similar features (e.g., in color, shape, or texture) in RGB images. Overall, RUNet outperformed the other models in the material classification of remote sensing images. The present findings indicate that it has potential for application in land use monitoring and disaster assessment as well as in model construction for simulation systems.

Introduction

Simulation systems, which realistically reproduce and visualize real environments, are used in military training, aerospace technology, urban planning, and disaster response. With regard to the complete operating interface of the simulation system, the environment displayed must be consistent with the actual scene. Variations in location, altitude, viewing angle, day and night lighting, and weather explain the time- and effort-intensive process of constructing three-dimensional terrain images, a key issue in the reconstruction of terrain and topography, which requires the development of a terrain model from images generated by a visual system. Although this is a virtual construct, the content is real and the physical environment is structured on the Internet.

Realism is essential in simulation. Consider the construction of the model of a house. Using textural information from the roof and wall surfaces as well as fitting the textured images to the three-dimensional surface of the house to achieve a realistic effect is worthy of exploration. The purpose of nesting is to obtain more information with higher specificity, as is integrating resources from different sources. The combination of two-dimensional (2D) with 2D and 2D with three-dimensional (3D) can be roughly classified as image to image [1], [2], image to topographic map, image to model, image to light detection and ranging [3], and the combination of topographic maps and light detection and ranging.

Image fitting, a technique commonly used in simulation, involves high construction and labor costs because differences in environmental conditions, such as position, height, viewing angle, perspective, day and night lighting, and weather, correspond to variations in the spatial coordinates, texture images, size, and color of the terrain model. One challenge is the automatic generation of night vision images by the terrain model at night instead of manually building a set of terrain models. The use of high-spatial-resolution image technology to classify terrain material (i.e., its characteristics, composition, and structure) combined with the spectral information of each point can provide ground-level images with more detailed information.

In general, optical images have 4 to 50 bands and a spectral resolution of 100 nm. Hyperspectral images can have 100 to 250 bands, with a spectral resolution of 3 nm, (5 nm in general). Higher spectral resolutions facilitate feature identification, but data on this type of image are variegated. Spectral responses vary by substance and wavelength, forming a so-called spectral signature. The differences in such spectral patterns can be used as the basis for classifying features. Therefore, substances that are difficult to distinguish in general multispectral images or mineral components with special spectral characteristics can also be detected in hyperspectral images. Although hyperspectral images have multiple waveband numbers, they also comprise substantial amounts of data. Coupled with the complexity of the equipment and the system, this means that such images are expensive to obtain. Therefore, the present study used visible light (i.e., general RGB) and multispectral images to classify the pixel material. Visible light images have a spectral range of approximately 450 to 700 nm, and multispectral images contain blue, green, and red visible light as well as near-infrared (NIR) light. They have four bands, with a spectral range of approximately 450 to 900 nm.

With the gradual maturation of computer vision technology in recent years, image recognition, classification, and pixel classification methods have also seen tremendous breakthroughs. The emergence of deep learning has greatly reduced the cost of artificial feature extraction; models can effectively identify images. Apart from classifying telemetry data with convolutional neural network (CNN) models, numerous studies have applied the concept of CNNs to the pixel classification of visible light and multispectral telemetry images with high spatial resolution, enabling the classification of image features such as color, texture, and shape on a pixel-wise basis. This is followed by the generation and subsequent classification of thermal (night vision) images by using the thermal energy emitted by the studied material (Fig. 1).

Although hyperspectral images contain more spectral bands, their spectral resolution represents the reflection characteristic of the material spectrum. These reflections can be used to identify various features. The shape of the spectral curve varies depending on the local materials, and features and spectral radiation responses differ according to the wavelength. Hyperspectral image analysis is the optimal approach to feature identification. However, as mentioned, hyperspectral images are expensive and difficult to obtain, and the data must be frequently updated. Therefore, a novel CNN architecture for automatic material classification based on multispectral telemetry images is presented. The multispectral wavelength is between 400 and 1040 nm (8 channels), whereas that of the full band is between 450 and 800 nm (3 channels).

The literature on model construction addresses model generation, sources of resources, and mapping methods. Studies on model generation involve the artificial selection of image feature points [4], the artificial selection of line segments with speculative features [5], and the extraction boundaries of terrain models [6]. Studies on resource sources have examined ground-based images [7], aerial images [8], and combined images [9]. As for mapping methods, investigations have centered on applying small patches to flat and cylindrical surfaces [10], directly pasting real ground images on house walls [11], extracting texture images in perspective [12], pasting texture images on walls with a load [12], and pasting real images on a plane by using a perspective method [13].

Images contain numerous types of substances, the identification and classification of which is based on characteristics such as color, texture, space, and shape. Image substances can be subjected to supervised or unsupervised classification. The supervised method uses training samples (i.e., data with known classes) to classify each input. [13]. Unsupervised classification does not consider training sample data [14].

The literature contains studies on unsupervised classification methods, including K-means clustering [15], simple linear iterative clustering [16], the iterative self-organizing data analysis technique [17], and fuzzy C-means clustering [18]. In recent years, some researchers have proposed the use of a CNN to build an unsupervised deep learning model [19]. Because supervised classification considers the categorical information of the training samples, it yields more accurate results than unsupervised classification. Furthermore, semi-supervised techniques that are based on labeled training samples and unlabeled samples have also been introduced [20].

Supervised learning methods include support vector machines, neural networks, and decision trees. With advances in deep learning, CNN models have had widespread use in various applications such as computer vision and natural language processing [21], with demonstrated effectiveness and feasibility. Along with image classification, CNNs have considerable use in semantic segmentation in the field of computer vision. Semantic segmentation is the classification of pixels in images. Numerous CNN-based material classification models have been proposed, including fully convolutional networks (FCNs) [22] for material classification; SegNet [23], a deep convolutional encoder–decoder architecture, for classifying streetscape images; U-Net [24] for classifying medical images; the multibias model (MBM) [25], which incorporates multibias into the rectified linear unit (ReLU) layer, for classifying hyperspectral images; the region-enhanced CNN (ReCNN) model [26], which detects objects in remote sensing images; and the deep spatial–spectral subspace clustering network (DS³C-Net) [27].

A study on telemetry images combined geometric edge feature extraction with multiscale road detection [28]; moreover, high-resolution visible light images were combined with Gabor, morphological filtering, and simple linear iterative aggregation. In another study, class- and region-adjacent graphs were used, and a hierarchical graph–based method for classifying roads was proposed [29]. Furthermore, models for CNN-based image and material classification have been developed: a dual-flow architecture for textural feature extraction and classification [30], a model for the classification of roads and buildings [31], and a comprehensive approach for scene and material classification [32].

Although multispectral telemetry images are relatively inexpensive and easy to obtain and can be subjected to material classification, the following challenges remain:

1. Presenting material characteristics on the surface of multispectral telemetry images is difficult.

2. Conventional methods are not suitable for the classification of multispectral data.

3. The relevance of multichannel data cannot be easily established from multispectral images.

4. Training telemetry images is difficult because of their large size.

5. For materials with numerous image changes, obtaining complete images during training is difficult.

6. The amount of material in the telemetry image affects the classification results.

7. Uncertainty is easily produced in material feature extraction and classification, negatively affecting classification accuracy. When examining hyperspectral images, uncertainty can stem from atmospheric conditions, radiation, and spatial resolution.

8. When processing optical data with low-to-moderate spatial recognition rates, mixed pixels between coverage classes affect classification and recognition.

As mentioned, simulation systems use terrain models, which are costly to build, to realistically present the environment. To ensure that the same set of terrain models covers terrain images captured under varying conditions (e.g., daylight and night), information on object materials must be incorporated into the terrain model. Material identification is relatively labor and time intensive. Identifying the materials in images beforehand can save substantial model-building costs. The present study used deep learning to classify pixels in visible light and multispectral telemetry images, and a material classification model involving a multi-CNN architecture was proposed. Material classification procedures are mainly classified into model training and model testing. During the training process, the model derives weights suitable for material classification. In the testing process, the image classification results can be obtained through the material classification model.

The novel CNN architecture for material classification presented in this study has multiple contributions. First, after material classification is performed, thermal images can be converted. Second, this architecture can replace hyperspectral images, which, as mentioned, are expensive and difficult to obtain and must be frequently updated. Third, the costs of terrain model construction are reduced. Fourth, regarding material classification, users can rapidly obtain information on topographical changes. Finally, the classification accuracy of this architecture surpasses that of conventional and similar neural network architectures. In the future, multi-size images and street view images will be combined to enhance the accuracy of material classification.

Section snippets

Related work

The material classification model in the present study involves semantic segmentation, in which the image pixels are classified. This is similar to the CNN used for the classification of the entire image, except for the fact that the material classification model does not have a fully connected layer. The CNN process begins with convolution and pooling, which is followed by the connection of the image to the features. Subsequently, independent analysis is conducted. CNNs have three layers. The

Proposed method

In this study, a novel CNN-based model was used to classify substances in telemetry images. The processing flow is divided into the training and testing of the material classification model. Robust U-Net (RUNet), a novel type of neural network model architecture, is proposed for material classification as follows. This model is based on the improved U-Net architecture, combined with the shortcut connections in ResNet. During training, the training image is preprocessed. Specifically, multiple

Experiment

Descriptions of the experimental environment and the image datasets are presented as follows. Next, the experimental evaluation criteria are examined, and the effectiveness of the experiment is discussed.

Conclusions

In this study, RUNet, a novel material classification model that combines multiple CNN architectures was proposed. RUNet classifies pixels in multispectral telemetry images and has potential in reducing the costs (e.g., in manpower) involved in terrain model construction. The material classification process is divided into model training and testing, with images preprocessed through random cropping during training. Testing involves image preprocessing through mirror compensation and overlapping

CRediT authorship contribution statement

Chuen-Horng Lin: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data, Writing - original draft, Writing - review & editing. Ting-You Wang: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by Ministry of Science and Technology, Taiwan , under Grant No. MOST 109-2221-E-025-010. Approval of the version of the manuscript to be published.

References (47)

GruenA. et al.
CC-Modeler: a topology generator for 3-D city models
ISPRS J. Photogramm. Remote Sens.
(1998)
WeinhausF.M. et al.
Photogrammetric texture mapping onto planar polygons
Graph. Models Image Process.
(1999)
BezdekJ.C. et al.
FCM: The fuzzy c-means clustering algorithm
Comput. Geosci.
(1984)
LeiJ. et al.
Deep spatial-spectral subspace clustering for hyperspectral image
IEEE Trans. Circuits Syst. Video Technol.
(2020)
AlshehhiR. et al.
Hierarchical graph-based segmentation for extracting road networks from high-resolution satellite images
ISPRS J. Photogramm. Remote Sens.
(2017)
AnwerR.M. et al.
Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification
ISPRS J. Photogramm. Remote Sens.
(2018)
AlshehhiR. et al.
Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks
ISPRS J. Photogramm. Remote Sens.
(2017)
KangJ. et al.
Building instance classification using street view images
ISPRS J. Photogramm. Remote Sens.
(2018)
HabibA.F. et al.
Line-based modified iterated hough transform for automatic registration of multi-source imagery
Photogramm. Rec.
(2004)
BentoutouY. et al.
An automatic image registration for applications in remote sensing
IEEE Trans. Geosci. Remote Sens.
(2005)

HabibA. et al.

Co-registration of photogrammetric and LIDAR data: Methodology and case study

Rev. Bras. Cartogr.

(2004)

RauJ.Y. et al.

Robust reconstruction of building models from three-dimensional line segments

Photogramm. Eng. Remote Sens.

(2003)

ChenL.C. et al.

Building reconstruction from LIDAR data and aerial imagery

TsaiF. et al.

Polygon-based texture mapping for cyber city 3D building models

Int. J. Geogr. Inf. Sci.

(2007)

WuJ. et al.

Automatic retrieval of optimal texture from aerial video for photo-realistic 3D visualization of street environment

FrühC. et al.

Constructing 3D City Models by Merging Aerial and Ground Views

IEEE Comput. Graph. Appl.

(2003)

E. Catmull, Computer display ofcurved surfaces, in: Proceedings ofthe IEEE conference on Computer Graphics, Pattern...

ChonJ. et al.

Urban visualization through video mosaics based on 3-D multibaselines

Int. Arch. Photogramm. Remote Sens.

(2004)

FruehC. et al.

Data processing algorithms for generating textured 3D building facade meshes from laser scans and camera images

Int. J. Comput. Vis.

(2005)

GhamisiP. et al.

Feature selection based on hybridization of genetic algorithm and particle swarm optimization

IEEE Geosci. Remote Sens. Lett.

(2015)

J. MacQueen, Some methods for classification and analysis of multivariate observations, in: Proceedings of the fifth...

AchantaR. et al.

SLIC superpixels compared to state-of-the-art superpixel methods

IEEE Trans. Pattern Anal. Mach. Intell.

(2012)

BallG.H. et al.

ISODATA, a Novel Method of Data Analysis and ClassificationTech. Rep.

(1965)

Cited by (11)

A fine digital soil mapping by integrating remote sensing-based process model and deep learning method in Northeast China
2024, Soil and Tillage Research
Accurate soil type maps provide an important basis for agricultural decision-making and land degradation control. In soil classification studies, the various environmental covariates are often selected based on the soil-forming framework. Since the mapping area and available observation data are limited, meteorological and vegetation cover factors have not been fully developed, and their role in soil classification needs to be evaluated. In addition, whether deep learning has out-performance in soil classification remains to be tested. The aim of this paper is to evaluate the accuracy of deep learning modelling techniques in classifying soil type using different combinations of input variables, and evaluate the importance of soil-forming variables in soil type classification. Therefore, we collected commonly used environmental covariates in Northeast China (NEC), including multiple meteorological factors and adopted a satellite-based biophysical model (Boreal Ecosystem Productivity Simulator, BEPS) to enrich vegetation cover factors. Next, four modeling strategies were developed: the soil-forming factors of soil and relief were considered as traditional environmental covariates (T), as well as combined with meteorologic variables (T + C), vegetation cover variables (T + V) and all available environmental covariates (T + C + V). Then, the effectiveness of different modeling strategies for soil classification was explored with convolutional neural network (CNN) model and multi-layer random forest (MRF) model based on soil separability. Finally, a 30 m resolution soil type map was established. The results demonstrated that both MRF and CNN can achieve high accuracy soil classification, while the CNN model performs better. The descending order of classification accuracy based on different modeling strategies of the CNN model is shown as T + C + V: 91.08%, T + V: 88.84%, T + C: 86.82%, and T: 83.96%. Meanwhile, the separability of different soil-forming factors for soil classification is soil properties, vegetation cover, temporal variation, meteorologic and relief in descending order. For Castanozems and Brown soils, MRF has higher classification accuracy, while CNN has better performance in Meadow soils and Fluvo-aquic soils. The methodology proposed in this paper aims to achieve high accuracy soil classification, provide an approach to understand the importance of soil-forming factors for the region, as well as for different soil types, and provide references for facilitating the interpretation of misclassified areas. Our results are accurate in the core areas, and therefore, this work facilitates researchers to be able to focus more on areas where different soil types intersect, thus significantly improves efficiency and saves resources, and promises to be a useful tool for future soil surveys.
Generic multispectral demosaicking using spectral correlation between spectral bands and pseudo-panchromatic image
2023, Signal Processing: Image Communication
Citation Excerpt :
Multispectral images have proved their utility in many image processing, and computer vision related applications [1–12].
Single imaging sensor-based multispectral imaging systems (MSISs) facilitate snapshot imaging with a multispectral filter array (MSFA). These single sensor-based MSISs need an effective image demosaicking method to generate the full-multispectral image from the mosaicked image captured by an imaging sensor with MSFA help. This paper proposes an effective and generic multispectral image demosaicking (MSID) method based on the concept of the pseudo-panchromatic (PPAN) image. The proposed MSID method accepts the mosaicked image capture with the help of preferred binary tree-based MSFA patterns. We estimate the PPAN image from the mosaicked image by utilizing the spatial filters developed carefully based on the spectral band’s relative placement in the binary tree-based MSFA pattern. Our proposed MSID method utilizes the spectral correlation between the PPAN image and the mosaicked image to generate the multispectral image. The proposed generic MSID method surpasses the existing generic MSID methods in different image quality parameters on the multispectral images from two benchmark multispectral datasets.
Engineering Applications of Urban Green Space Planning in Mountainous Areas: An Improved Structure-based RS Land Class Information Extraction Method for U-Net Networks
2023, Earth Science Informatics
Metaheuristic Optimization Based Deep Learning Model for Multispectral Image Classification
2023, Research Square
Vector Deep Fuzzy Neural Network for Breast Cancer Classification
2023, Sensors and Materials
A fuzzy evaluation approach to determine superiority of deep learning network system in terms of recognition capability: case study of lung cancer imaging
2023, Annals of Operations Research

View all citing articles on Scopus

View full text

A novel convolutional neural network architecture of multispectral remote sensing images for automatic material classification

Highlights

Abstract

Introduction

Section snippets

Related work

Proposed method

Experiment

Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

ISPRS J. Photogramm. Remote Sens.

Graph. Models Image Process.

Comput. Geosci.

IEEE Trans. Circuits Syst. Video Technol.

ISPRS J. Photogramm. Remote Sens.

ISPRS J. Photogramm. Remote Sens.

ISPRS J. Photogramm. Remote Sens.

ISPRS J. Photogramm. Remote Sens.

Line-based modified iterated hough transform for automatic registration of multi-source imagery

Photogramm. Rec.

An automatic image registration for applications in remote sensing

IEEE Trans. Geosci. Remote Sens.

Co-registration of photogrammetric and LIDAR data: Methodology and case study

Rev. Bras. Cartogr.

Robust reconstruction of building models from three-dimensional line segments

Photogramm. Eng. Remote Sens.

Building reconstruction from LIDAR data and aerial imagery

Polygon-based texture mapping for cyber city 3D building models

Int. J. Geogr. Inf. Sci.

Automatic retrieval of optimal texture from aerial video for photo-realistic 3D visualization of street environment

Constructing 3D City Models by Merging Aerial and Ground Views

IEEE Comput. Graph. Appl.

Urban visualization through video mosaics based on 3-D multibaselines

Int. Arch. Photogramm. Remote Sens.

Data processing algorithms for generating textured 3D building facade meshes from laser scans and camera images

Int. J. Comput. Vis.

Feature selection based on hybridization of genetic algorithm and particle swarm optimization

IEEE Geosci. Remote Sens. Lett.

SLIC superpixels compared to state-of-the-art superpixel methods

IEEE Trans. Pattern Anal. Mach. Intell.

ISODATA, a Novel Method of Data Analysis and ClassificationTech. Rep.