Deep support vector machine for hyperspectral image classification

doi:10.1016/j.patcog.2020.107298

Pattern Recognition

Volume 103, July 2020, 107298

https://doi.org/10.1016/j.patcog.2020.107298 Get rights and content

Highlights

•
Deep Support Vector Machine (DSVM) introduced in hyperspectral image classification.
•
DSVM outperformed five other classification algorithms, including Deep Neural Network.
•
DSVM is a state-of-the-art algorithm to improve hyperspectral image classification.

Abstract

To improve on the robustness of traditional machine learning approaches, emphasis has recently shifted to the integration of such methods with Deep Learning techniques. However, the classification problems, complexity and inconsistency in several spectral classifiers developed for hyperspectral images are some reasons warranting further research. This study investigates the application of Deep Support Vector Machine (DSVM) for hyperspectral image classification. Two hyperspectral images, Indian Pines and University of Pavia are used as tentative test beds for the experiment. The DSVM is implemented with four kernel functions: Exponential Radial Basis Function (ERBF), Gaussian Radial Basis Function (GRBF), neural and polynomial. Stand-alone Support Vector Machines form the interconnecting weights of the entire network. The network is trained with one hundred input datasets, and the interconnecting weights of the network are initialised using the regularisation parameter of the model. Numerical results show that the classification accuracies of the DSVM for Indian Pines and University of Pavia based on each DSVM kernel functions are: ERBF (98.87%, 98.16%), GRBF (98.90%, 98.47%), neural (98.41%, 97.27%), and polynomial (99.24%, 98.79%). By comparing the DSVM algorithm against well-known classifiers, Support Vector Machine (SVM), Deep Neural Network (DNN), Gaussian Mixture Model (GMM), K Nearest Neighbour (KNN), and K Means (KM) classifiers, the mean classification accuracies for Indian Pines and University of Pavia are: DSVM (98.86%, 98.17%), SVM (76.03%, 73.52%), DNN (94.45%, 93.79%), GMM (76.82%, 78.35%), KNN (76.87%, 78.80%), and KM (21.65%, 18.18%). These results indicate that the DSVM outperformed the other classification algorithms. The high accuracy obtained with the DSVM validates its efficacy as state-of-the-art algorithm for hyperspectral image classification.

Introduction

Hyperspectral Image (HSI) has been found invaluable because of its numerous applications and ability to obtain remotely sensed information from the visible through the near infrared wavelength ranges thus providing multi-spectral channels from the same location (e.g. [10], [27]). HSIs are highly innovative remote sensing imageries that consist of hundreds of contiguous narrow spectral bands, which unlike the conventional panchromatic and multispectral imageries enable a better distinct discrimination of object classes [34]. However, the major challenge for scientists is how to efficiently classify HSIs (see, e.g., [17]). Some of these challenges as detailed by Ghamisi et al. [10] include increased presence of redundant spectral information and high dimensionality in observed data, among others. Several conventional unsupervised and supervised machine learning classifiers have been used for classifying HSIs and includes prominent unsupervised conventional classifiers such as Fuzzy C-Means (FCM) and K Means (KM). While notable conventional supervised classifiers (e.g., K Nearest Neighbour (KNN) and Gaussian Mixture Model (GMM)) have been used in the classification of HSIs, the use of contemporary classifiers such as Support Vector Machine (SVM) and Artificial Neural Network (ANN) are gradually emerging [20], [31].

But recently, emphasis has shifted from conventional methods to the integration of Deep Learning (DL) and ANN, which scientists have argued grossly enhanced the robustness of the traditional ANN. For example, Paoletti et al. [28] showed that the use of DL to train the conventional ANN significantly increased the efficiency of the ANN for HSI classification. Haut et al. [11] implemented an integrated Deep Convolutional Neural Network (DCNN) using a new Bayesian approach and found that the hybrid of DL and the traditional ANN classifier enhanced the efficiency of the traditional ANN classifier. Furthermore, a novel guided filter based Deep Recurrent Neural Network (DRNN) for HSI classification has proved to be more efficient than the traditional ANN classifier [8]. Zhao et al. [35] recently proposed an integrated Convolutional Neural Network (CNN) and Gray Level Co-occurrence Matrix (GLCM) textural features for HSI classification using limited training sample. More recently (e.g., [19]), other DL ANN algorithms such as the use of Deep Belief Network have been employed in HSI classification with major highlights on their merits as opposed to conventional methods. However, classification problems associated with small-size training dataset in traditional machine learning techniques have been reported (e.g., [4]).

Recent advances in convolutional neural networks, including activation function, loss function, regularization, optimization and fast computation have been documented. Gu et al. [7] who provided details on these advances also highlighted the weakness of CNN, indicating computational efficiency and the choice of a suitable hyper-parameters (e.g., learning rate, kernel sizes of convolutional filters) are still challenging issues, especially for large-scale data. The use of a new CNN architecture for the classification of hyperspectral images was therefore predicated on the computational constraints of CNN algorithms to high-dimensional data contained in multidimensional data cubes [28]. Although the application of deep learning techniques, especially CNN in image-based cancer detection, diagnosis and other disciplines have shown significant strength [13], [18], improvements are required to handle large-scale multi-resolution data cubes. It is against this background, that assessing the skills of other non-parametric deep learning algorithms such as the deep SVM has become necessary.

In addition to other classification methods such as random forests, neural networks, and logistic regression-based techniques, the SVM [5] is another robust classifier that has been used in hyperspectral data classification (e.g., [2], [10]). Since the introduction of the SVM it has proven to be very efficient in remote sensing (RS) image classification, tide analysis, and prediction of urban land use change (e.g., [24]). Due to SVM's ability to model complex real-world data, they were found to be relatively better predictive models as opposed to agent-based models whose inability to evaluate model constraints and results have been highlighted in previous studies (e.g., [23], [25], [29], [30], [36]). Even though new algorithms such as the Supervised Fuzzy Partitioning are now competitive with the SVM, the latter is still largely characterized as a state-of-the-art algorithm [1]. SVM is intrinsically a binary classifier, it can be modified however, for multi-class problems by using mainly the One Against All (OAA) or One Against One (OAO) technique [14]. The OAA and OAO techniques have proven to be considerably effective in the classification of remote sensing images [26].

While classification problems still exist with traditional machine learning techniques, the complexity (e.g., availability of training samples) surrounding the implementation of different classification algorithms are some reasons warranting further research on ideal techniques for HSI. This was echoed in Ghamisi et al. [10] who noted the inconsistency in several spectral classifiers developed for HSI based on selected metrics. Furthermore, the use of deep belief network in improving classification outputs of hyperspectral images and multi-temporal images has recently been demonstrated [15], [19]. Whereas several parametric and prominent non-parametric algorithms have been widely used in image classification (see, e.g., [10], [20], [31]), the assessment and accuracy of HSI classification based on Deep Support Vector Machine (DSVM) however, is largely undocumented. One of the key challenges with HSI classification is limited training samples. It is for this reason that the development of new optimization algorithms such as deep learning is increasingly becoming popular and effective in the fields of image recognition and classification, especially for the classification of large multi-spectral and hyperspectral datasets [28]. Moreover, the success of these new algorithms in automatic feature extraction, computer vision, language processing and speech recognition have recently been re-echoed [17]. There are still some constraints nonetheless, on the application of these methods to multispectral and hyperspectral images. A regularized ensemble framework of deep learning that incorporates SVM is therefore crucial to further explore these challenges.

Arguably, deep learning algorithms have attracted the attention of the remote sensing community and several other experts in the fields of speech recognition, computer vision, and natural language processing among others. To explore the application of deep learning methods in hyperspectral image classification, a multi-grained network that appears to be an ensemble deep learning method has been proposed [27]. Another hybrid model that integrates unsupervised deep belief network with a one-class SVM were found to be scalable and computationally efficient in an earlier study [6]. While theoretical foundations and optimization techniques for learning deep CNN architectures are required [7], ensemble deep learning methods have shown considerable potentials in the classification of HSI classification. For example, the coupling of deep belief network with a SVM algorithm addressed complexity and scalability issues of SVM in large datasets [6]. The use of hybrid models is therefore emerging as efficient, accurate and scalable techniques that can improve the classification accuracy of large-scale and high-dimensional data.

The aim of this research therefore is to integrate DL and SVM to formulate a hybrid DSVM. The architecture of the DSVM proposed for this experiment imitates the Deep Neural Network (DNN) that consists of multiple hidden layers. Normally the hidden layer neurons are connected by series of weights; but instead the weights are initialised by several SVM functions modified with the SVM regularisation parameter. The optimal DSVM output for each input is found by updating all the connecting SVM functions in the hidden layer. Two HSIs, Indian Pines and University of Pavia are used as experimental and tentative test beds for the study. The OAA multi-class technique are used to modify the SVM for multi-class separation. To assess the robustness of our hybrid DSVM model, the results of the DSVM are compared to those of the SVM, DNN, GMM, KNN, and KM.

Section snippets

Deep support vector machine framework

SVM classifies a binary problem using a linear hyperplane by assuming that the training set has n-training samples, that is, (x₁, y₁), (x₂, y₂), ..., (x_n, y_n), where x_i ∈ ℜ^N is an N dimensional vector that belongs to one of classes $y_{i} \in {- 1, + 1}$ [9]. The stated binary classification problem can be separated using a linear decision function, $f (x) = w \cdot x + b$ where w ∈ ℜ^N is a vector that determines the orientation of the desired hyperplane required for the separation, and b ∈ ℜ is called the “bias.” The

Results

The DSVM presented in this study was designed to mimic the operation of the DNN. The individual SVMs that is, f(x) were made to function as interconnecting weights of the network. The resulting outputs were compared against the target output F(X) based on the backpropagation technique. One hundred distinct inputs X₁, X₂, …, X₁₀₀were used to obtain one hundred distinct outputs F₁(X), F₂(X), …, F₁₀₀(X). The regularisation parameter was used to initialise the network in order to ensure that the

Discussion and conclusion

Imbalanced, multi-class learning problems have recently been addressed by Yuan, et al. [33] who used a regularized ensemble framework of DL methods. They showed that DL algorithms are capable of handling multi-class data sets because of the regularization parameter. Although several other sophisticated algorithms have been used to address similar problems, especially in image-based cancer detection and diagnosis [13], reduced computational cost and efficiency of ensemble-based approaches have

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Dr Onuwa Okwuashi is an expert in machine learning applications and remote sensing of the environment. He obtained his Ph.D. in 2011 at the Victoria University of Wellington, New Zealand. He is currently a senior lecturer at the University of Uyo, Nigeria where he teaches analytical methods in geospatial science and machine learning applications. Apart from his University teaching experience, He is also a senior consultant in the University of Uyo Geoinformatics Consult.

References (36)

M. Chi et al.
Classification of hyperspectral remote-sensing data with primal SVM for small- sized training dataset problem
Adv. Space Res.
(2008)
S.M. Erfani et al.
High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning
Pattern Recognit.
(2016)
J. Gu et al.
Recent advances in convolutional neural networks
Pattern Recognit.
(2018)
Y. Guo et al.
Guided filter based deep recurrent neural networks for hyperspectral image classification
Procedia Comput. Sci.
(2018)
Z. Hu et al.
Deep learning for image-based cancer detection and diagnosis − a survey
Pattern Recognit.
(2018)
S. Kang et al.
Constructing a multi-class classifier using one-against-one approach with different binary classifiers
Neurocomputing
(2015)
M. Längkvist et al.
A review of unsupervised feature learning and deep learning for time-series modeling
Pattern Recognit. Lett.
(2014)
P. Li et al.
Deep visual tracking: review and experimental comparison
Pattern Recognit.
(2018)
L. Li et al.
Hyperspectral image classification by AdaBoost weighted composite kernel extreme learning machines
Neurocomputing
(2018)
C.E. Ndehedehe et al.
Assessing land water storage dynamics over South America
J. Hydrol.
(2020)

B. Pan et al.

MugNet: deep learning for hyperspectral image classification using limited samples

ISPRS J. Photogramm. Remote Sens.

(2018)

M.E. Paoletti et al.

A new deep convolutional neural network for fast hyperspectral image classification

ISPRS J. Photogramm. Remote Sens.

(2018)

J. Xu et al.

Multi-model ensemble with rich spatial information for object detection

Pattern Recognit.

(2020)

X. Yuan et al.

A regularized ensemble framework of deep learning for cancer detection from multi-class, imbalanced training data

Pattern Recognit.

(2018)

L. Zhao et al.

LandSys: an agent-based cellular automata model of land use change developed for transportation analysis

J. Transp. Geogr.

(2012)

P. Ashtari et al.

Supervised fuzzy partitioning

Pattern Recognit.

(2020)

B. Bigdeli et al.

A multiple SVM system for classification of hyperspectral remote sensing data

J. Indian Soc. Remote Sens.

(2013)

W. Chen et al.

A comparative study of landslide susceptibility maps produced using support vector machine with different kernel functions and entropy data mining models in China

Bull. Eng. Geol. Environ.

(2018)

Cited by (174)

Data and knowledge-driven deep multiview fusion network based on diffusion model for hyperspectral image classification
2024, Expert Systems with Applications
It is a crucial means for humans to perceive geomorphic features and landscape architectures by classifying ground objects in hyperspectral images (HSIs). Currently, the exponential development of neural networks has provided a powerful support for the accurate HSI classification. However, existing neural network-based methods usually rely solely on the data to drive the classification model, lacking attention to valuable land-cover distribution knowledge in HSIs. In view of this, to utilize hyperspectral data and distribution knowledge simultaneously, a data and knowledge-driven deep multiview fusion network based on diffusion model (DKDMN) is proposed in this paper. DKDMN extracts knowledge from unlabeled data in HSIs through a diffusion model-based knowledge learning framework (DMKLF), and combines raw hyperspectral data with the acquired knowledge through a designed deep multiview network architecture (DMNA) to mine complicated land-cover distribution information and reflect sample relationships. First, the proposed DMKLF utilizes the data distribution reconstructed by the diffusion model as a knowledge source for one view to enhance the network cross-sample awareness ability. On the other hand, the original HSI patches are considered a data source for another view, which co-drives DMNA with the unsupervised diffusion knowledge extracted by DMKLF to perform effective feature extraction. Second, taking into account the characteristics of each view and the feature similarity between these two views, a joint loss function specifically for DMNA is suggested to minimize the difference between the model predictions and the real labels. Finally, a multi-backbone integration classification framework (MBICF) is designed by deeply fusing three vision architectures to capture multi-scale spectral features and local–global features, thereby achieving pixel-wise classification effectively. Experimental results on four publicly available HSI datasets demonstrate that the proposed DKDMN achieves competitive classification accuracy compared with other state-of-the-art methods. For instance, the proposed DKDMN achieves an overall accuracy improvement of 1.62% and 2.18% on the Indian Pines and Salinas Valley datasets, respectively, compared to the multiple vision architecture-based hybrid network (MVAHN). The related code will be released at https://github.com/ZJier/DKDMN.
A novel graph-attention based multimodal fusion network for joint classification of hyperspectral image and LiDAR data
2024, Expert Systems with Applications
The joint classification of hyperspectral image (HSI) and Light Detection and Ranging (LiDAR) data can provide complementary information for each other, which has become a prominent topic in the field of remote sensing. Nevertheless, the common CNN-based fusion techniques still suffer from the following drawbacks. (1) Most of these models omit the correlation and complementarity between different data sources and always fail to model the long-distance dependencies of spectral information well. (2) Simply splicing the multi-source feature embeddings overlooks the deep semantic relationships among them. To tackle these issues, we propose a novel graph-attention based multimodal fusion network (GAMF). Specifically, it employs three major components, including an HSI–LiDAR feature extractor, a graph-attention based fusion module and a classification module. In the feature extraction module, we consider the correlation and complementarity between multi-sensor data by parameter sharing and employ Gaussian tokenization for feature transformation additionally. To address the problem of long-distance dependencies, the deep fusion module utilizes modality-specific tokens to construct an undirected weighted graph, which is essentially a heterogeneous graph. And the deep semantic relationships between them are exploited utilizing a graph-attention based fusion framework. At the end, two fully connected layers classify the fused embeddings. Experiment evaluations on several benchmark HSI–LiDAR datasets (Trento, University of Houston 2013 and MUUFL) show that GAMF achieves more accurate prediction results than some state-of-the-art baselines. The code is available at https://github.com/tyust-dayu/GAMF.
Generalization Memorization Machine with Zero Empirical Risk for Classification
2024, Pattern Recognition
Classifying the training data correctly without over-fitting is one of the goals in machine learning. In this paper, we propose a general Generalization Memorization Machine (GMM) to obtain zero empirical risk with better generalization. The widely applied loss-based learning models can be extended by the GMM to improve their memorization and generalization abilities. Specifically, we propose two new models based on the GMM, called Hard Generalization Memorization Machine (HGMM) and Soft Generalization Memorization Machine (SGMM). Both HGMM and SGMM obtain zero empirical risks with well generalization, and the SGMM further improves the capacity and applicability of HGMM. The optimization problems in the proposed models are quadratic programming problems and could be solved efficiently. Additionally, the recently proposed generalization memorization kernel and the corresponding support vector machine are the special cases of our SGMM. Experimental results demonstrate the effectiveness of the proposed HGMM and SGMM both on memorization and generalization.
High-throughput phenotyping using VIS/NIR spectroscopy in the classification of soybean genotypes for grain yield and industrial traits
2024, Spectrochimica Acta - Part A: Molecular and Biomolecular Spectroscopy
Employing visible and near infrared sensors in high-throughput phenotyping provides insight into the relationship between the spectral characteristics of the leaf and the content of grain properties, helping soybean breeders to direct their program towards improving grain traits according to researchers' interests. Our research hypothesis is that the leaf reflectance of soybean genotypes can be directly related to industrial grain traits such as protein and fiber contents. Thus, the objectives of the study were: (i) to classify soybean genotypes according to the grain yield and industrial traits; (ii) to identify the algorithm(s) with the highest accuracy for classifying genotypes using leaf reflectance as model input; (iii) to identify the best input data for the algorithms to improve their performance. A field experiment was carried out in randomized block design with three replications and 32 soybean genotypes. At 60 days after emergence, spectral analysis was carried out on three leaf samples from each plot. A hyperspectral sensor was used to capture reflectance between the wavelengths from 450 to 824 nm. Representative spectral bands were selected and grouped into means. After harvest, grain yield was assessed and laboratory analyses of industrial traits were carried out. Spectral, industrial traits and yield data were subjected to statistical analysis. Data were analyzed by the following machine learning algorithms: J48 (J48) and REPTree (DT) decision trees, Random Forest (RF), Artificial Neural Networks (ANN), Support Vector Machine (SVM), and conventional Logistic Regression (LR) analysis. The clusters formed were used as the output of the models, while two groups of input data were used for the input of the models: the spectral variables (WL) noise-free obtained by the sensor (450–828 nm) and the spectral means of the selected bands (SB) (450.0–720.6 nm). Soybean genotypes were grouped according to their grain yield and industrial traits, in which the SVM and J48 algorithms performed better at classifying them. Using the spectral bands selected in the study improved the classification accuracy of the algorithms.
Predicting land cover driven ecosystem service value using artificial neural network model
2024, Remote Sensing Applications: Society and Environment
Understanding the synergies and trade-offs of major cities' ecosystem services is vital to mitigating regional ecological and environmental risks and enhancing human well-being in this era of rapid urbanization and global climate change. This study aimed to assess and predict the land use- and land cover (LULC)-driven ecosystem service value (ESV) dynamics in Arkansas’s capital city, Little Rock. Historical LULC data were derived by applying support vector machine learning algorithms to Landsat satellite imagery. The benefit transfer method was utilized to identify nine types of ecosystem services and their corresponding economic values. A cellular automata artificial neural network model was used to simulate future potential LULC and ESV patterns. Vegetation accounted for more than 94% of total ESV over the past two decades. However, a 38.40% expansion of built-up areas resulted in a 45.28% decrease in vegetated areas, which reduced total ESV from $3619.73 $\times$ 10⁶ to $2563.81 $\times$ 10⁶ during 2003–2023. By 2033, the city’s urban area will expand to 72.75% of the total area and will witness further declines of 30.35 km² in vegetation, 19.30 km² in barren soil, and 1.69 km² in waterbody areas. Consequently, the ESVs of these natural landscapes will decline by $708.58 $\times$ 10⁶, $44.87 $\times$ 10⁶, and $15.69 $\times$ 10⁶, respectively. Provisioning services will be most affected, followed by supporting, regulating, and cultural services. The study findings provide reference information to policymakers and the local government for use in adopting sustainable land management policies, thereby promoting the ecological value of Little Rock.
Hierarchical structural graph neural network with local relation enhancement for hyperspectral image classification
2024, Digital Signal Processing: A Review Journal
In recent years, graph convolutional networks (GCNs) have made remarkable achievements in the hyperspectral image (HSI) classification task. However, existing GCN-based methods cannot adequately encode similarity edge relationship between superpixels, and few of them use hierarchical mechanism to extract complementary features. This paper addresses these issues and proposes a hierarchical structural graph neural network with local relation enhancement (HSLRE) for HSI classification. Specifically, the features of the pixel-level graph structures are extracted and then embedded into the superpixel-level graph structure to ensure that it does not lose the fine texture features of the original HSI. Secondly, a novel hierarchical framework, which consists of multiple coarsening and refining stages, is proposed to extract multi-level features. In the first coarsening stage, the relational graph convolution (RGC) is introduced to enhance local relations and obtain discriminative features from the superpixel-level graph. In the subsequent coarsening stages, graph convolution (GC) is used to extract features. The refining stages correspond to the coarsening stages, which are used to restore the graphs to their original structures. Finally, to enhance the fluidity of feature information, the fully connected layers and two different types of graph convolutional layers are utilized to extract the linear and nonlinear features of the nodes in parallel, which are fused in a weighted way to form effective features. Experimental results on several benchmark HSI datasets illustrate the effectiveness of the HSLRE.

View all citing articles on Scopus

Dr Christopher Ndehedehe is an expert in remote sensing hydrology and environmental geoinformatics. His Ph.D. thesis was awarded Curtin University Chancellor's Commendation for Exceptional Higher Degree by Research. Christopher won the 2018 D. B. Johnston Award for Excellence in the Spatial Sciences area, Curtin University, Australia where he obtained his Ph.D. in 2017. Christopher is a key scientist, providing leadership and training in the remote sensing components of several projects in Australian Rivers Institute funded by the National Environmental Science Programme. Christopher is currently a Research Fellow at the Australian Rivers Institute, Griffith University where he is driving innovative research directions in remote sensing of the environment and applications of advanced multivariate techniques to assess impacts of climate change on groundwater and ecological resources.

View full text

Deep support vector machine for hyperspectral image classification

Highlights

Abstract

Introduction

Section snippets

Deep support vector machine framework

Results

Discussion and conclusion

Declaration of Competing Interest

Acknowledgement

Adv. Space Res.

Pattern Recognit.

Pattern Recognit.

Procedia Comput. Sci.

Pattern Recognit.

Neurocomputing

Pattern Recognit. Lett.

Pattern Recognit.

Neurocomputing

J. Hydrol.

ISPRS J. Photogramm. Remote Sens.

ISPRS J. Photogramm. Remote Sens.

Pattern Recognit.

Pattern Recognit.

J. Transp. Geogr.

Supervised fuzzy partitioning

Pattern Recognit.

A multiple SVM system for classification of hyperspectral remote sensing data

J. Indian Soc. Remote Sens.

A comparative study of landslide susceptibility maps produced using support vector machine with different kernel functions and entropy data mining models in China

Bull. Eng. Geol. Environ.