A novel convolutional neural network architecture of multispectral remote sensing images for automatic material classification
Introduction
Simulation systems, which realistically reproduce and visualize real environments, are used in military training, aerospace technology, urban planning, and disaster response. With regard to the complete operating interface of the simulation system, the environment displayed must be consistent with the actual scene. Variations in location, altitude, viewing angle, day and night lighting, and weather explain the time- and effort-intensive process of constructing three-dimensional terrain images, a key issue in the reconstruction of terrain and topography, which requires the development of a terrain model from images generated by a visual system. Although this is a virtual construct, the content is real and the physical environment is structured on the Internet.
Realism is essential in simulation. Consider the construction of the model of a house. Using textural information from the roof and wall surfaces as well as fitting the textured images to the three-dimensional surface of the house to achieve a realistic effect is worthy of exploration. The purpose of nesting is to obtain more information with higher specificity, as is integrating resources from different sources. The combination of two-dimensional (2D) with 2D and 2D with three-dimensional (3D) can be roughly classified as image to image [1], [2], image to topographic map, image to model, image to light detection and ranging [3], and the combination of topographic maps and light detection and ranging.
Image fitting, a technique commonly used in simulation, involves high construction and labor costs because differences in environmental conditions, such as position, height, viewing angle, perspective, day and night lighting, and weather, correspond to variations in the spatial coordinates, texture images, size, and color of the terrain model. One challenge is the automatic generation of night vision images by the terrain model at night instead of manually building a set of terrain models. The use of high-spatial-resolution image technology to classify terrain material (i.e., its characteristics, composition, and structure) combined with the spectral information of each point can provide ground-level images with more detailed information.
In general, optical images have 4 to 50 bands and a spectral resolution of 100 nm. Hyperspectral images can have 100 to 250 bands, with a spectral resolution of 3 nm, (5 nm in general). Higher spectral resolutions facilitate feature identification, but data on this type of image are variegated. Spectral responses vary by substance and wavelength, forming a so-called spectral signature. The differences in such spectral patterns can be used as the basis for classifying features. Therefore, substances that are difficult to distinguish in general multispectral images or mineral components with special spectral characteristics can also be detected in hyperspectral images. Although hyperspectral images have multiple waveband numbers, they also comprise substantial amounts of data. Coupled with the complexity of the equipment and the system, this means that such images are expensive to obtain. Therefore, the present study used visible light (i.e., general RGB) and multispectral images to classify the pixel material. Visible light images have a spectral range of approximately 450 to 700 nm, and multispectral images contain blue, green, and red visible light as well as near-infrared (NIR) light. They have four bands, with a spectral range of approximately 450 to 900 nm.
With the gradual maturation of computer vision technology in recent years, image recognition, classification, and pixel classification methods have also seen tremendous breakthroughs. The emergence of deep learning has greatly reduced the cost of artificial feature extraction; models can effectively identify images. Apart from classifying telemetry data with convolutional neural network (CNN) models, numerous studies have applied the concept of CNNs to the pixel classification of visible light and multispectral telemetry images with high spatial resolution, enabling the classification of image features such as color, texture, and shape on a pixel-wise basis. This is followed by the generation and subsequent classification of thermal (night vision) images by using the thermal energy emitted by the studied material (Fig. 1).
Although hyperspectral images contain more spectral bands, their spectral resolution represents the reflection characteristic of the material spectrum. These reflections can be used to identify various features. The shape of the spectral curve varies depending on the local materials, and features and spectral radiation responses differ according to the wavelength. Hyperspectral image analysis is the optimal approach to feature identification. However, as mentioned, hyperspectral images are expensive and difficult to obtain, and the data must be frequently updated. Therefore, a novel CNN architecture for automatic material classification based on multispectral telemetry images is presented. The multispectral wavelength is between 400 and 1040 nm (8 channels), whereas that of the full band is between 450 and 800 nm (3 channels).
The literature on model construction addresses model generation, sources of resources, and mapping methods. Studies on model generation involve the artificial selection of image feature points [4], the artificial selection of line segments with speculative features [5], and the extraction boundaries of terrain models [6]. Studies on resource sources have examined ground-based images [7], aerial images [8], and combined images [9]. As for mapping methods, investigations have centered on applying small patches to flat and cylindrical surfaces [10], directly pasting real ground images on house walls [11], extracting texture images in perspective [12], pasting texture images on walls with a load [12], and pasting real images on a plane by using a perspective method [13].
Images contain numerous types of substances, the identification and classification of which is based on characteristics such as color, texture, space, and shape. Image substances can be subjected to supervised or unsupervised classification. The supervised method uses training samples (i.e., data with known classes) to classify each input. [13]. Unsupervised classification does not consider training sample data [14].
The literature contains studies on unsupervised classification methods, including K-means clustering [15], simple linear iterative clustering [16], the iterative self-organizing data analysis technique [17], and fuzzy C-means clustering [18]. In recent years, some researchers have proposed the use of a CNN to build an unsupervised deep learning model [19]. Because supervised classification considers the categorical information of the training samples, it yields more accurate results than unsupervised classification. Furthermore, semi-supervised techniques that are based on labeled training samples and unlabeled samples have also been introduced [20].
Supervised learning methods include support vector machines, neural networks, and decision trees. With advances in deep learning, CNN models have had widespread use in various applications such as computer vision and natural language processing [21], with demonstrated effectiveness and feasibility. Along with image classification, CNNs have considerable use in semantic segmentation in the field of computer vision. Semantic segmentation is the classification of pixels in images. Numerous CNN-based material classification models have been proposed, including fully convolutional networks (FCNs) [22] for material classification; SegNet [23], a deep convolutional encoder–decoder architecture, for classifying streetscape images; U-Net [24] for classifying medical images; the multibias model (MBM) [25], which incorporates multibias into the rectified linear unit (ReLU) layer, for classifying hyperspectral images; the region-enhanced CNN (ReCNN) model [26], which detects objects in remote sensing images; and the deep spatial–spectral subspace clustering network (DS3C-Net) [27].
A study on telemetry images combined geometric edge feature extraction with multiscale road detection [28]; moreover, high-resolution visible light images were combined with Gabor, morphological filtering, and simple linear iterative aggregation. In another study, class- and region-adjacent graphs were used, and a hierarchical graph–based method for classifying roads was proposed [29]. Furthermore, models for CNN-based image and material classification have been developed: a dual-flow architecture for textural feature extraction and classification [30], a model for the classification of roads and buildings [31], and a comprehensive approach for scene and material classification [32].
Although multispectral telemetry images are relatively inexpensive and easy to obtain and can be subjected to material classification, the following challenges remain:
1. Presenting material characteristics on the surface of multispectral telemetry images is difficult.
2. Conventional methods are not suitable for the classification of multispectral data.
3. The relevance of multichannel data cannot be easily established from multispectral images.
4. Training telemetry images is difficult because of their large size.
5. For materials with numerous image changes, obtaining complete images during training is difficult.
6. The amount of material in the telemetry image affects the classification results.
7. Uncertainty is easily produced in material feature extraction and classification, negatively affecting classification accuracy. When examining hyperspectral images, uncertainty can stem from atmospheric conditions, radiation, and spatial resolution.
8. When processing optical data with low-to-moderate spatial recognition rates, mixed pixels between coverage classes affect classification and recognition.
As mentioned, simulation systems use terrain models, which are costly to build, to realistically present the environment. To ensure that the same set of terrain models covers terrain images captured under varying conditions (e.g., daylight and night), information on object materials must be incorporated into the terrain model. Material identification is relatively labor and time intensive. Identifying the materials in images beforehand can save substantial model-building costs. The present study used deep learning to classify pixels in visible light and multispectral telemetry images, and a material classification model involving a multi-CNN architecture was proposed. Material classification procedures are mainly classified into model training and model testing. During the training process, the model derives weights suitable for material classification. In the testing process, the image classification results can be obtained through the material classification model.
The novel CNN architecture for material classification presented in this study has multiple contributions. First, after material classification is performed, thermal images can be converted. Second, this architecture can replace hyperspectral images, which, as mentioned, are expensive and difficult to obtain and must be frequently updated. Third, the costs of terrain model construction are reduced. Fourth, regarding material classification, users can rapidly obtain information on topographical changes. Finally, the classification accuracy of this architecture surpasses that of conventional and similar neural network architectures. In the future, multi-size images and street view images will be combined to enhance the accuracy of material classification.
Section snippets
Related work
The material classification model in the present study involves semantic segmentation, in which the image pixels are classified. This is similar to the CNN used for the classification of the entire image, except for the fact that the material classification model does not have a fully connected layer. The CNN process begins with convolution and pooling, which is followed by the connection of the image to the features. Subsequently, independent analysis is conducted. CNNs have three layers. The
Proposed method
In this study, a novel CNN-based model was used to classify substances in telemetry images. The processing flow is divided into the training and testing of the material classification model. Robust U-Net (RUNet), a novel type of neural network model architecture, is proposed for material classification as follows. This model is based on the improved U-Net architecture, combined with the shortcut connections in ResNet. During training, the training image is preprocessed. Specifically, multiple
Experiment
Descriptions of the experimental environment and the image datasets are presented as follows. Next, the experimental evaluation criteria are examined, and the effectiveness of the experiment is discussed.
Conclusions
In this study, RUNet, a novel material classification model that combines multiple CNN architectures was proposed. RUNet classifies pixels in multispectral telemetry images and has potential in reducing the costs (e.g., in manpower) involved in terrain model construction. The material classification process is divided into model training and testing, with images preprocessed through random cropping during training. Testing involves image preprocessing through mirror compensation and overlapping
CRediT authorship contribution statement
Chuen-Horng Lin: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data, Writing - original draft, Writing - review & editing. Ting-You Wang: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported in part by Ministry of Science and Technology, Taiwan , under Grant No. MOST 109-2221-E-025-010. Approval of the version of the manuscript to be published.
References (47)
- et al.
CC-Modeler: a topology generator for 3-D city models
ISPRS J. Photogramm. Remote Sens.
(1998) - et al.
Photogrammetric texture mapping onto planar polygons
Graph. Models Image Process.
(1999) - et al.
FCM: The fuzzy c-means clustering algorithm
Comput. Geosci.
(1984) - et al.
Deep spatial-spectral subspace clustering for hyperspectral image
IEEE Trans. Circuits Syst. Video Technol.
(2020) - et al.
Hierarchical graph-based segmentation for extracting road networks from high-resolution satellite images
ISPRS J. Photogramm. Remote Sens.
(2017) - et al.
Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification
ISPRS J. Photogramm. Remote Sens.
(2018) - et al.
Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks
ISPRS J. Photogramm. Remote Sens.
(2017) - et al.
Building instance classification using street view images
ISPRS J. Photogramm. Remote Sens.
(2018) - et al.
Line-based modified iterated hough transform for automatic registration of multi-source imagery
Photogramm. Rec.
(2004) - et al.
An automatic image registration for applications in remote sensing
IEEE Trans. Geosci. Remote Sens.
(2005)
Co-registration of photogrammetric and LIDAR data: Methodology and case study
Rev. Bras. Cartogr.
Robust reconstruction of building models from three-dimensional line segments
Photogramm. Eng. Remote Sens.
Building reconstruction from LIDAR data and aerial imagery
Polygon-based texture mapping for cyber city 3D building models
Int. J. Geogr. Inf. Sci.
Automatic retrieval of optimal texture from aerial video for photo-realistic 3D visualization of street environment
Constructing 3D City Models by Merging Aerial and Ground Views
IEEE Comput. Graph. Appl.
Urban visualization through video mosaics based on 3-D multibaselines
Int. Arch. Photogramm. Remote Sens.
Data processing algorithms for generating textured 3D building facade meshes from laser scans and camera images
Int. J. Comput. Vis.
Feature selection based on hybridization of genetic algorithm and particle swarm optimization
IEEE Geosci. Remote Sens. Lett.
SLIC superpixels compared to state-of-the-art superpixel methods
IEEE Trans. Pattern Anal. Mach. Intell.
ISODATA, a Novel Method of Data Analysis and ClassificationTech. Rep.
Cited by (11)
A fine digital soil mapping by integrating remote sensing-based process model and deep learning method in Northeast China
2024, Soil and Tillage ResearchGeneric multispectral demosaicking using spectral correlation between spectral bands and pseudo-panchromatic image
2023, Signal Processing: Image CommunicationCitation Excerpt :Multispectral images have proved their utility in many image processing, and computer vision related applications [1–12].
Vector Deep Fuzzy Neural Network for Breast Cancer Classification
2023, Sensors and Materials