Next Article in Journal
Agency Contracts under Maximum-Entropy
Next Article in Special Issue
News Video Summarization Combining SURF and Color Histogram Features
Previous Article in Journal
Photonic Reservoir Computer with Output Expansion for Unsupervized Parameter Drift Compensation
Previous Article in Special Issue
Robust Vehicle Speed Measurement Based on Feature Information Fusion for Vehicle Multi-Characteristic Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Pairwise Elastic Net Representation-Based Classification for Hyperspectral Image Classification

1
School of Mathematics and Computer Science, Wuhan Polytechnic University, Wuhan 430023, China
2
Electronic Information School, Wuhan University, Wuhan 430072, China
*
Author to whom correspondence should be addressed.
Entropy 2021, 23(8), 956; https://doi.org/10.3390/e23080956
Submission received: 16 June 2021 / Revised: 9 July 2021 / Accepted: 15 July 2021 / Published: 26 July 2021
(This article belongs to the Special Issue Advances in Image Fusion)

Abstract

:
The representation-based algorithm has raised a great interest in hyperspectral image (HSI) classification. l 1 -minimization-based sparse representation (SR) attempts to select a few atoms and cannot fully reflect within-class information, while l 2 -minimization-based collaborative representation (CR) tries to use all of the atoms leading to mixed-class information. Considering the above problems, we propose the pairwise elastic net representation-based classification (PENRC) method. PENRC combines the l 1 -norm and l 2 -norm penalties and introduces a new penalty term, including a similar matrix between dictionary atoms. This similar matrix enables the automatic grouping selection of highly correlated data to estimate more robust weight coefficients for better classification performance. To reduce computation cost and further improve classification accuracy, we use part of the atoms as a local adaptive dictionary rather than the entire training atoms. Furthermore, we consider the neighbor information of each pixel and propose a joint pairwise elastic net representation-based classification (J-PENRC) method. Experimental results on chosen hyperspectral data sets confirm that our proposed algorithms outperform the other state-of-the-art algorithms.

1. Introduction

A hyperspectral image is a 3D remote sensing image containing hundreds of bands, from visible to infrared spectra. Due to their abundant spectral information, HSIs have become an actual application in the field of remote sensing, such as skin imaging [1], ground elements identifying [2] and mineral exploration [3]. To date, many classification algorithms for hyperspectral datasets have been proposed. Among the techniques, the support vector machine (SVM)  [4], Gaussian mixture-model (GMM) [5] and the Gaussian maximum-likelihood classifier (MLC) [6] are all proved to be effective for solving HSI classification problem. The most concerning research methods in recent years can be roughly divided into two categories: representation-based algorithms and deep learning-based algorithms. On the one hand, in order to make full use of the spectral and spatial information of HSIs, some effective spectral–spatial feature extraction methods have been combined with sparse models to improve the characterization capability of models, such as  [7,8,9,10]. On the other hand, since the deep convolutional neural network (CNN) with deep architecture has been proven to be very effective in using image features, this type of method using deep CNN for hyperspectral feature extraction has stimulated various studies [11,12,13,14].
This paper is mainly focused on the HSI classification algorithm based on representation learning. The classification principle of the method is to assume that each testing pixel can be reconstructed with labeled training pixels. Then, the abundance coefficients of the testing pixel can be obtained with the penalty of l 1 -norm or l 2 -norm, which is named sparse representation classification (SRC) [15] and collaborative representation classification (CRC) [16]. In [17], Chen et al.  first introduced the sparsity model into hyperspectral classification and proposed the joint sparse representation classification (JSRC) method by incorporating the contextual information. In [18], considering that different atoms have different importance for the reconstruction process, Li et al.  proposed the nearest regularized subspace (NRS) classifier with Tikhonov regularization. By wisely combing SRC and KNN, in Ref. [19] a class-dependent sparse representation classifier (cdSRC) was proposed. However, some research [18,20] shows that the collaboration of approximation enhances classification results rather than competition. Therefore, in Ref. [21], a joint within-class CRC was provided to solve the HSI classification tasks. In [22], the kernel version of CRC was further considered and the Kernel-based CRC (KCRC) was proposed. There are also some investigations dedicated to improving classification effectiveness. On the one hand, some focus on the more simple and robust dictionary to reduce computation costs. On the other hand, some take the neighborhood spatial information as an important factor in improving classification accuracy. In Ref. [23], the nonlocal joint collaborative representation (NJCRC) algorithm was proposed by utilizing a subdictionary whose atoms are obtained by the k-nearest neighbor (K-NN) with testing samples rather than the whole dictionary atoms. In [24], Fang et al.  introduced the shape irregular neighbor region into the joint SRC model and proposed the shape adaptive joint sparse representation (SAJSRC).
It is worth noting that both the SRC-based algorithms and CRC-based algorithms have their limitations. In these representation-based classification models, the obtained abundance coefficients reflect the importance of each training sample for reconstruction. Accordingly, the primary concern of this type of method is the solution of the abundance coefficient. Ideally, the test pixels should be linearly represented by atoms from the same category. The nonzero terms of sparse coefficients should be located at the position of the corresponding class. For SRC, it tends to select as few atoms as possible. The too sparse property will lead to the deviation of the absolute reconstruction error, and the sparsity will be weakened when the number of training atoms sets is small. For CRC, it tends to select all the atoms for reconstruction, and the class discrimination will be weak when including mixed-class information. Intuitively, SRC and CRC should be balanced to achieve better classification performance is necessary.
To solve the above problem, in Ref. [25], the elastic net representation-based classification (ENRC) method was proposed. The elastic net originally raised in [26] encourages both sparsity and grouping by forming a convex combination of the CRC and SRC governed by a selectable parameter. Furthermore, the elastic net can yield a sparse estimate with more than n nonzero weights. Based on these advantages, the ENRC improves of HSI classification performance. However, the optimal balance factors are all obtained by traversing the manufactured parameter space. This makes the algorithm time-consuming and complex. Additionally, the pixelwise fusion algorithm cannot make full use of the spatial information of the HSI.
Fortunately, the recent literature [27] has pointed out that the pairwise elastic net (PEN) model using similarity measures between regressors can establish a local balance between SRC and CRC. It can achieve more flexible grouping than ENRC. Moreover, PEN allows the customization of the sparsity relationship between any two features. Hence, in this work, we propose the pairwise elastic net representation-based classification (PENRC) method to overcome the indigenous disadvantages of ENRC, SRC and CRC. It can automatically achieve the balance between l 1 -norm and l 2 -norm so that more robust weight coefficients can be estimated, and further realizing better between-class sparse and intraclass collaborative classification performance.
Specifically, the main contribution of the proposed PENRC can be briefly summarized as follows. First, considering the computation cost when using all the dictionaries, we adopt the KNN to select the labeled atoms, which are more similar to the testing pixel as an optimal sub-dictionary. Then, unlike the ENRC, which assigns only a single global tradeoff between sparsity and collaboration, we introduce a similar matrix about sub-dictionary atoms in penalties, resulting in the local sparsity and collaboration tradeoff and be more flexible than ENRC. After obtaining the abundance coefficients, we use the principle of minimum reconstruction error to decide the final label. We also provide a further extension of our algorithm by incorporating the neighbor information of each pixel.
In summary, it is expected that the abundance coefficients from PENRC reveal a more powerful discriminant ability, thereby outperforming the original SRC, CRC and ENRC.
The remaining parts of the paper are organized as follows: Section 2 briefly introduces the two classical SRC and CRC classifiers. Section 3 details the proposed PENRC mechanism. Section 4 gives the experimental results on chosen two datasets. Finally, Section 5 concludes this paper.

2. Related Works

Denoting a testing pixel as y = [ y 1 , , y B ] R B × 1 and the dictionary composed of training atoms with class order as X = [ X 1 , , X C ] R B × N , where B is the number of spectral bands, N = c = 1 C N c is the training atoms number and C is the total number of categories. The sub-dictionary X c R B × N c is the set of training atoms in c-th class.

2.1. Sparse Representation for HSI Classification

The sparse model assumes that a testing pixel can be linearly approximated with few dictionary atoms suitably [15]. Then, for a testing pixel y , the purpose of SRC model is to obtain the corresponding abundance coefficients by minimizing the reconstruction error y X α S R C 2 2 with the sparse constraint term α S R C 1 . Mathematically, the object function can be represent as follows:
α ^ S R C = arg min y X α S R C 2 2 + λ 1 α S R C 1 ,
where λ 1 is the balancing parameter. The weight vector α S R C R N × 1 is sparse and only have few nonzero terms. It can be obtained by solving Equation (1) with basis pursuit (BP) or basis pursuit denoising (BPDN) algorithms [28,29]. When l 2 -norm is directly used, Equation (1) can be solved by subspace pursuit (SP) and orthogonal matching pursuit (OMP) algorithms  [30].
After obtaining the weight vector α S R C , we can assign the final class label which corresponding the mimimum reconstruction error to the testing pixel:
c l a s s ( y ) = arg min c = 1 , , C y y c ^ 2 2 = arg min c = 1 , , C y X c α ^ c S R C 2 2 ,
where α c S R C is the subset of sparse vector α S R C which belongs to c-th class.

2.2. Collaborative Representation for HSI Classification

Unlike SRC model, the CRC assumes that a testing pixel can be linearly combined with all the training set [21]. The CRC attempts to obtain abundance coefficients by minimizing the reconstruction error y X α C R C 2 2 with the term α C R C 2 . Thus, the CRC can be expressed as:
α ^ C R C = arg min y X α C R C 2 2 + λ 2 α C R C 2 ,
where λ 2 balances the influence of the reconstruction error and constraint term. Equation (3) can be simply solved with a closed form. Assuming that the derivative of the above cost function and is zero, we can obtain the optimal value of α C R C :
α C R C = X T X + λ 2 I 1 X T y ,
where I is an identity matrix with the size of N × N . After obtaining α C R C , the final class label c of testing pixel can be determined with the minimm residual rule as introduced in last section.
For the above representation-based classification methods, training atoms tend to be “competitive” in SRC due to the sparse constraints. With l 2 -norm, all atoms participate in the representation process equally. Thus, CRC tends to be “cooperative”. Researchers compared the performance of SRC with CRC in literature [21,22]. Moreover, the experiments showed that in some cases, SRC performances better than CRC while CRC performance was better in other cases. For example, in remote sensing images, the SRC algorithm gave rise to a more remarkable improvement with some mixed pixels [31]. Thus, it is an effective way to combine SRC and CRC appropriately. In fact, in Ref. [25], FRC and ENRC algorithms to combine SRC with CRC were proposed. However, the dictionary chosen in [25] consists of all the training samples and brings a large computational burden. In addition, the algorithms in [25] only set a global trade-off between SRC and CRC, leading to the inflexible balance of different classes.

3. Proposed PENRC

The framework of our proposed PENRC algorithm is shown in Algorithm 1. First, we built a local adaptive dictionary to reduce the amount of calculation. Given a test pixel, we used the KNN algorithm to select the K pixels that are most similar to the local adaptive dictionary set. Second, we constructed the PENRC model of the hyperspectral image. We used the local adaptive dictionary to construct the PEN model and obtain the abundance coefficients corresponding to the testing pixel. Then, we calculated the reconstruction error of each class according to the abundance coefficients and used the minimum reconstruction error to classify the testing pixels. In addition, in order to further improve the classification performance, we also integrated the spatial information of the pixel neighborhood into the model, named joint pairwise elastic net representation-based classification (J-PENRC).
Algorithm 1 the Proposed PEN Algorithm
Input:      (1) X R B × N , the training set.
                 (2) K, λ .
Procedure:
  • Step 1: Obtain adaptive dictionary D by applying KNN.
  • Step 2: Obtain weight vector α ^ according to Equation (8):
  •                   for i = 1 : N
  •                   update α ^ i by Equations (19) and  (20).
  • Step 3: Decide the final label class y by the minimum reconstruction error principle by Equation (14).
  • Output:
  •         class y .

3.1. Local Adaptive Dictionary

In representation-based methods, dictionaries are usually composed of all labeled training pixels [32,33]. In order to have a robust representation, it is necessary to ensure that the dictionary is complete (that is, enough training samples are needed). However, training samples are usually limited in practice. In addition, using all training pixels directly will lead to a large amount of computation. Therefore, to solve the above problems, we utilize the local adaptive dictionary to obtain a more robust representation.
For a testing pixel y , we utilize the KNN to construct a similar signal set D as the adaptive dictionary. However, due to the high dimension of the hyperspectral image, it is unreasonable to directly use Euclidean distance to measure the similarity of the spectral vector. In order to increase the separability of data, LDA [34] algorithm is used to project HSIs into low-dimensional space, which can find an optimal projection direction to minimize the intraclass distance of samples and maximize the inter-class distance. Let Γ R B × B indicate the LDA mapping matrix and B represent the reduced dimension. Then, the similarity measure between the testing atom y and arbitrary training atoms x n can be expressed as:
d n = Γ y Γ x n .
Then, we sorted all the distance set x 1 , x 2 , , x N in descending order and obtained the dictionary indices i c c = 1 , C corresponding to the first K large distance values. The adaptive dictionary can be denoted as:
D = X : , i c , c = 1 , , C .

3.2. Pairwise Elastic Net Representation Based Classification

First, we introduce the concept of correlation matrix. Consider the following two matrices R 1 and R 2 :
R 1 = 1.0 0.5 0.5 0.5 1.0 0.5 0.5 0.5 1.0 R 2 = 1.0 0.9 0.0 0.9 1.0 0.3 0.0 0.3 1.0 .
We can see that the three features in the R 1 matrix have the same similarity values. At this point, it is effective to set the global trade-off between l 1 -norm and l 2 -norm. Nevertheless, for the matrix R 2 , feature 1 is very similar to feature 2 (regarding l 2 -norm), feature 1 is independent from feature 3 (regarding l 1 -norm) and feature 2 is slightly related to feature 3 (regarding elastic net). Hence, we need a flexible trade-off scheme to match the regularization term with the data structure.
Thus, the objective function of our proposed PENRC can be denoted as:
α ^ = arg min y D α 2 2 + λ α 2 2 + α 1 2 α T R α ,
where R is the similarity matrix between atoms in the adaptive dictionary D R K × K . Some frequently-used similarity measures are absolute atom correlation R i j = D i T D j and Gaussian kernel R i j = exp D i D j 2 / σ 2 et al. Considering some basic results and notation with abundance coefficients and similarity matrix:
α 2 2 = α T I α
α 1 = α T 1 = 1 T α
α 1 2 = α T 1 1 T α ,
where I is the identity matrix and 1 is a vector of all ones. Then, the fourth term in Equation (8) representing the trade-off between l 1 -norm and l 2 -norm can be explained as follows. For the completely similar features, R = 1 1 T . Equation (8) only left l 2 -norm, reducing the impact of the l 1 constraint. For the completely dissimilar features, R = I , and Equation (8) reduces to SRC model with only l 1 constraint. That is to say, when the two features are similar, we take the CRC method; when the two features are dissimilar, we take the SRC method; for the remaining cases, we take the ENRC method. Thus, the flexible trade-off scheme can be realized though our proposed PENRC.
To further enhance the classification performance, we also incorporate the spatial information of HSI pixel into the PENRC model. In [24], a shape adaptive (SA) region is proposed for each pixel. In our work, we utilized the neighbor information with SA and the chosen pixel can be represented by the average of all pixels in the SA window. For an arbitrary pixel y in the HSI, the corresponding SA set matrix can be denoted as Y S A = y 1 , y 2 , , y T . T is the number of chosen pixels in SA. Then, the pixel y introduced into spatial information can be obtained by
y ¯ S A = 1 T t = 1 T y T ,
Then, the sparse coefficients α S A for y ¯ S A can be denoted as:
α ^ S A = arg min y ¯ S A D ¯ S A α S A 2 2 + λ α S A 2 2 + α S A 1 2 α S A T R α S A ,
Once the sparse coefficients α S A is obtained, the final label can be determined by the category minimum reconstruction error:
class y = arg min c = 1 , , C y ¯ S A D ¯ c S A α c S A 2 ,
where, D ¯ c S A and α c S A represent the subset of D ¯ S A and α S A corresponding to c-th class, respectively.

3.3. Coordinate Descent

To solve Equation (8), we rewrite it as following:
α ^ = arg min y D α 2 2 + λ α T P α ,
where P = I + 1 1 T R . As [27] proves, only if P has nonnegative entries and is a positive semidefinite (PSD) matrix, the second term α T P α in above model is convex. However, the matrix P in Equation (15) is not always a PSD matrix. We can consider the following way as proved in [27]:
P θ S = θ I + 1 θ P ,
where τ τ + 1 θ 1 and τ = min 0 , λ m i n P .
Then, Equation (15) can be seen as a quadratic program (QP) problem and can be solved by the QP solver. However, the QP solver does not meet the high-dimensional data requirements. In order to obtain more exact results, we use the coordinate descent method [35] in this paper. The approach can be summarized as: given a convex function f α , we calculate the derivative f α i ; update α i by holding all α j (where j i ) fixed with the equation f α i = 0 ; cyclic each α i iteratively until the termination condition is satisfied.
In PENRC, we have
f α = arg min y D α 2 2 + λ α T P α = y T y 2 q T α + α T Q α + λ i , j P i , j α i α j ,
where P is PSD and nonnegative, Q = D T D and q = X T y . Then, the derivative f α i is
f α i = 2 q i + 2 Q i T α + 2 λ sgn α i j = 1 K P i , j α j .
If the derivative is 0, we update α i according to α i = α 1 : K \ i :
Q i i + P i i α i + sgn α i λ j i P i j α j = q i j i Q i j α j .
Then, we define the scalars a , b and c. Let a = Q i i + P i i , b = λ j i P i j α j and c = q i j i Q i j α j . The update equation can be denoted as:
α i = c + b / a c < b 0 b c b c b / a c > b .

4. Results

In this part, to validate the superiority of our proposed PENRC, we compare our proposed PENRC (pixelwise) and PENRC with both the single pixel-based and spatial information-based algorithms such as the KNN [36], SRC [15], CRC [16], fused representation-based classification (FRC) method [25], elastic net representation-based classification (ENRC) method [25], nearest regularized subspace (NRS) classifier [18], shape adaptive joint sparse representation classification (SAJSRC) [24] and weighted joint nearest neighbor and sparse representation (WINN-JSR) [37]. All the experiments are conducted using MATLAB R2014b on a 2.50 GHz PC with 8.0 GB RAM.

4.1. Data Set

In this paper, we chose the three HSI data sets for experimental evaluation.
The first testing data set is Indian Pines dataset. The scene is obtained by AVIRIS sensor over the Indian Pines test site in Northwest Indiana [38]. The size of the image is 145 × 145 with 224 spectral reflectance bands whose wavelength ranging from 0.4 μ m to 2.5 μ m. Removing the crops with less coverage, we choose 9 kinds of crops in the given ground truth which are corn notill , corn mintill , grass pasture , grass trees , hay windrowed , soybean notill , soybean mintill , soybean clean and woods . Figure 1a,b illustrate the corresponding false color composition and ground truth map respectively.
The second data set is the Pavia Centre data set, which is acquired by the ROSIS sensor during a flight campaign over Pavia. The geometric resolution is 1.3 m. The image size is 1096 × 715 × 102 . Due to the lack of information in the image, some samples do not contain any information. Therefore, it must be discarded before analysis. For Pavia Centre, we chose nine classes in the given ground truth: water , trees , asphalt , self blocking   bricks , bitumen , tiles , shadows , meadow and bare   soil . Figure 2a,b illustrate the corresponding false color composition and ground truth map, respectively.
The third one is the Pavia University data set, also collected by the ROSIS sensor. The spatial resolution is 610 × 340 , and it contains 103 spectral bands. The Pavia University dataset contains nine classes with the given ground truth: asphalt , meadows , gravel , trees , paninted   mental   sheets , bare   soil , bitumen , self blocking   bricks and shadows . Figure 3a,b illustrate the corresponding false color composition and ground truth map, respectively.

4.2. Parameter Analysis

During the experiment, we used three evaluation indicators to measure the classification performance: OA, AA and Kappa [39]. OA represents the proportion of all correctly classified atoms to the total number of testing atoms, while AA is the average value of OAs in each class. Kappa indicates the percentage of classified testing pixels corrected by the number of agreements that would be expected by chance. Detailed definitions for each indicator can be referred to [40].
There are two main parameters (the number of adaptive dictionary atoms K and balancing parameter λ ) that have a significant impact on classification results in our proposed PENRC. In this section, we analyze the impact of the two parameters by carrying out the sweep of the chosen parameter space and find the optimal parameters according to Figure 2. For Indian Pines, we chose 10% pixels per class as training samples. For Pavia Center, we chose 100 pixels per class as training samples and the same number for the Pavia University dataset. From Figure 4, we can see that OA increases first and then decreases with the K value increasing. Few adaptive dictionary atoms lack enough locality information, and too many dictionary atoms may introduce redundant category information. Then, we fixed the value of K, and the classification can be locally maximum with the appropriate value of λ . Then, from the maximum OA shown in Figure 4, we set K to 20 and λ to 1 × 10 3 , 1 × 10 2 and 1 × 10 4 for Indian Pines, Pavia Center and Pavia University, respectively.

4.3. Comparisons with Other Approaches

To avoid any bias, we repeated the experiments five times and reported the average classification accuracy.
For Indian Pines, we employ 10% labeled samples per class as training set and others as testing set. The detailed partition strategy is illustrated in Table 1. From Table 2, we can see the classification performance of our proposed PENRC and J-PENRC as well as chosen compared algorithms, and the optimal results for each class are indicated in bold. For certain classes, such as grass pasture , grass trees , hay windrowed and woods , the classification accuracies of our proposed PENRC and J-PENRC can be above 98%, specially for hay windrowed , which can be up to 100%. For category soybean clean , our algorithm improves the classification accuracy by 19.08% relative to the chosen optimal comparison algorithm ENRC. Furthermore, from Table 1, we can clearly see that our algorithms are optimal in terms of OA, AA and Kappa. In order to prove the effectiveness of our algorithm more comprehensively, we also compare the OAs, which are calculated under the different number of training samples. The classification results are shown in Figure 4. The abscissa represents the number of training samples per class, and the ordinate represents the classification accuracy. The dashed line represents OAs of the pixelwise algorithms, and the solid line represents OAs of the algorithms based on spatial information. From Figure 4, we can see that even in the case of insufficient training samples, our algorithm can achieve an ideal classification result. Furthermore, our algorithm have always been optimal compared to the same kind of contrast algorithms.
For Pavia Center, we employ 100 labeled samples per class as a training set and 2500 per class as a testing set.The detailed partition strategy is illustrated in Table 1. Table 3 illustrates the classification performance of our proposed PENRC and J-PENRC compared to other chosen algorithms, and the optimal results for each class are indicated in bold. For meadow , the classification accuracies of our proposed PENRC can be above 99.6%. For some classes, such as asphalt and tile , the classification accuracies of our J-PENRC can be above 99%. Especially for water , the classification accuracy of both PENRC and J-PENRC can be up to 100%. Furthermore, Table 3 illustrates that our proposed algorithms are optimal in terms of OA, AA and Kappa compared to other chosen algorithms. In order to further prove the effectiveness of our algorithm, we also compare the OAs of chosen algorithms under the different number of training samples. The classification results are shown in Figure 5. The number of training samples is selected from 50 samples per class to 300 samples per class. It can be seen from Figure 5 that compared with similar algorithms, our algorithm always has the best classification effect.
With regard to the Pavia University dataset, we randomly selected 100 labeled samples per class as a training set and 800 per class in the rest as a testing set (such as the shaows class, which only contains 947 labeled samples). The detailed partition strategy is illustrated in Table 1. Table 4 presents the classification result of our proposed PENRC and J-PENRC with other comparison algorithms, and the optimal results for each class are denoted in bold. For bitumen , the classification accuracy of our proposed PENRC reached 99.17%. For some classes, such as gravel and bair   coil , the classification accuracy of our J-PENRC can be above 97%. Especially for meadows , painted   mental   sheets and shadows , the classification accuracy of J-PENRC can be up to 100%. In addition, in terms of OA, AA and Kappa, Table 3 also illustrates that our proposed algorithms are optimal compared to other chosen algorithms. In order to further prove the effectiveness of our algorithm, we also compared the OAs of the chosen algorithms with different numbers of training samples. The classification results are shown in Figure 5. The number of training samples is selected from 50 samples per class to 300 samples per class. It is evident that our algorithm always gains the most extraordinary performance.

4.4. Computational Complexity

In this section, we compare the computational complexity for each classifier with the Indian Pines, Pavia University and Pavia Centre datasets. All above experiments were executed five times to avoid any bias. Table 5 illustrates the total time of algorithm execution and verification of the three datasets. All experimental settings and the parameters were set to be the same as described above. As can be seen from Table 5, ENRC has a lower time complexity than PENRC. There are two reasons for this. First, ENRC uses artificial prior information to set a fixed weight parameter to combine l 1 -norm and l 2 -norm, while PENRC automatically learns this weight parameter through the similarity matrix. Second, the time complexity consumed by the solution approximation algorithm used by the two methods is not the same due to the difference in the math models. On the other hand, Table 5 also lists the time complexity comparison with or without LAD. Obviously, the use of LAD substantially reduces the computational complexity of PENRC and yields a better classification performance.

5. Conclusions

In this paper, we proposed a hyperspectral image classification algorithm named PENRC. The local constrained dictionary was first constructed to reduce the computation costs. Then, by introducing a correlation matrix, the PENRC was constructed to realize the group sparsity with self-balancing between l 1 -norm and l 2 -norm. The pairwise elastic net model was proven to be capable of the grouping selection of highly correlated data via establishing local, or pairwise, tradeoffs of similarity between correlation matrices, thereby rendering more robust weight coefficients. To further improve the classification performance, we also introduced spatial information and proposed the J-PENRC model. The experimental results of real hyperspectral images verified that the proposed algorithms could outperform the existing representation-based classifiers. Compared to the existing pixelwise and spatial-based algorithm, experiments on our chosen Indian Pines verified the effectiveness of our proposed PENRC and J-PENRC in quantitative and qualitative terms.

Author Contributions

All authors have made great contributions to the work. Conceptualization,Y.Z., Y.M. and X.M.; Software, Y.Z. and X.M.; Writing—original draft, Y.M. and Y.Z.; Writing—review and editing, H.L., S.Z. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant no. 61903279 and 61773295) and NSFC (grant no. 61906140), NSFC-CAAC (grant no. U1833119), Hubei Natural Science Foundation for Distinguished Young Scholars (2020CFA063) and National Food and Strategic Reserves Administration Foundation (grant no. LQ2018501).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
HSIHyperspectral Image
SRSparse Representation
CRCollaborative Representation
PENRCPairwise Elastic Bet Representation Based Classification
J-PENRCJoint-Pairwise Elastic Bet Representation Based Classification
SVMSupport Vector Machine
GMMGaussian Mixture-Model
MLCMaximum-Likelihood Classifier
SRCSparse Representation Classification
CRCCollaborative Representation Classification
JSRCJoint Sparse Representation Classification
NRSNearest Regularized Subspace
cdSRCClass-dependent Sparse
KCRCKernel-Based CRC
NJCRCNonlocal Joint Collaborative Representation
K-NNK-Nearest Neighbor
SAJSRCShape Adaptive Joint Sparse Representation
FRCFused Representation-Based Classification
ENRCElastic Net Representation Based Classification
PENPairwise Elastic Net
SAShape Adaptive
QPQuadratic Program
WINN-JSRJoint Nearest Neighbor and Sparse Representation

References

  1. Bykov, A.; Zherebtsov, E.; Dremin, V.; Popov, A.; Doronin, A.; Meglinski, I. Hyperspecral Skin Imaging with Artificial Neural Networks Validated by Optical Biotissue Phantoms. In Proceedings of the Computational Optical Sensing and Imaging, Munich, Germany, 24–27 June 2019; Optical Society of America: Washington, DC, USA, 2019; p. CW1A–3. [Google Scholar]
  2. Keshava, N. Distance metrics and band selection in hyperspectral processing with applications to material identification and spectral libraries. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1552–1565. [Google Scholar] [CrossRef]
  3. Bishop, C.A.; Liu, J.G.; Mason, P.J. Hyperspectral remote sensing for mineral exploration in Pulang, Yunnan Province, China. Int. J. Remote Sens. 2011, 32, 2409–2426. [Google Scholar] [CrossRef]
  4. Melgani, F.; Bruzzone, L. Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790. [Google Scholar] [CrossRef] [Green Version]
  5. Li, W.; Prasad, S.; Fowler, J.E. Decision fusion in kernel-induced spaces for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2013, 52, 3399–3411. [Google Scholar] [CrossRef] [Green Version]
  6. Li, W.; Prasad, S.; Fowler, J.E.; Bruce, L.M. Locality-preserving dimensionality reduction and classification for hyperspectral image analysis. IEEE Trans. Geosci. Remote Sens. 2011, 50, 1185–1198. [Google Scholar] [CrossRef] [Green Version]
  7. Zhang, Y.; Ma, Y.; Dai, X.; Li, H.; Mei, X.; Ma, J. Locality-constrained sparse representation for hyperspectral image classification. Inf. Sci. 2021, 546, 858–870. [Google Scholar] [CrossRef]
  8. Dian, R.; Li, S.; Fang, L. Learning a low tensor-train rank representation for hyperspectral image super-resolution. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2672–2683. [Google Scholar] [CrossRef]
  9. Dong, W.; Wang, H.; Wu, F.; Shi, G.; Li, X. Deep spatial–spectral representation learning for hyperspectral image denoising. IEEE Trans. Comput. Imaging 2019, 5, 635–648. [Google Scholar] [CrossRef]
  10. Sellami, A.; Dupé, F.X.; Cagna, B.; Kadri, H.; Ayache, S.; Artières, T.; Takerkart, S. Mapping individual differences in cortical architecture using multi-view representation learning. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
  11. Ghamisi, P.; Maggiori, E.; Li, S.; Souza, R.; Tarablaka, Y.; Moser, G.; De Giorgi, A.; Fang, L.; Chen, Y.; Chi, M.; et al. New frontiers in spectral-spatial hyperspectral image classification: The latest advances based on mathematical morphology, Markov random fields, segmentation, sparse representation, and deep learning. IEEE Geosci. Remote Sens. Mag. 2018, 6, 10–43. [Google Scholar] [CrossRef]
  12. Sellami, A.; Farah, I. Spectra-spatial graph-based deep restricted boltzmann networks for hyperspectral image classification. In Proceedings of the 2019 PhotonIcs & Electromagnetics Research Symposium-Spring (PIERS-Spring), Rome, Italy, 17–20 June 2019; pp. 1055–1062. [Google Scholar]
  13. Mei, X.; Pan, E.; Ma, Y.; Dai, X.; Huang, J.; Fan, F.; Du, Q.; Zheng, H.; Ma, J. Spectral-spatial attention networks for hyperspectral image classification. Remote Sens. 2019, 11, 963. [Google Scholar] [CrossRef] [Green Version]
  14. Lei, Z.; Zeng, Y.; Liu, P.; Su, X. Active deep learning for hyperspectral image classification with uncertainty learning. IEEE Geosci. Remote Sens. Lett. 2021. [Google Scholar] [CrossRef]
  15. Li, C.; Ma, Y.; Mei, X.; Liu, C.; Ma, J. Hyperspectral image classification with robust sparse representation. IEEE Geosci. Remote Sens. Lett. 2016, 13, 641–645. [Google Scholar] [CrossRef]
  16. Jia, S.; Deng, X.; Zhu, J.; Xu, M.; Zhou, J.; Jia, X. Collaborative representation-based multiscale superpixel fusion for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7770–7784. [Google Scholar] [CrossRef]
  17. Chen, Y.; Nasrabadi, N.M.; Tran, T.D. Hyperspectral image classification using dictionary-based sparse representation. IEEE Trans. Geosci. Remote Sens. 2011, 49, 3973–3985. [Google Scholar] [CrossRef]
  18. Li, W.; Tramel, E.W.; Prasad, S.; Fowler, J.E. Nearest regularized subspace for hyperspectral classification. IEEE Trans. Geosci. Remote Sens. 2013, 52, 477–489. [Google Scholar] [CrossRef] [Green Version]
  19. Cui, M.; Prasad, S. Class-dependent sparse representation classifier for robust hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2014, 53, 2683–2695. [Google Scholar] [CrossRef]
  20. Zhang, L.; Yang, M.; Feng, X. Sparse representation or collaborative representation: Which helps face recognition? In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 471–478. [Google Scholar]
  21. Li, W.; Du, Q. Joint within-class collaborative representation for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2200–2208. [Google Scholar] [CrossRef]
  22. Li, W.; Du, Q.; Xiong, M. Kernel collaborative representation with Tikhonov regularization for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2014, 12, 48–52. [Google Scholar]
  23. Li, J.; Zhang, H.; Huang, Y.; Zhang, L. Hyperspectral image classification by nonlocal joint collaborative representation with a locally adaptive dictionary. IEEE Trans. Geosci. Remote Sens. 2013, 52, 3707–3719. [Google Scholar] [CrossRef]
  24. Fu, W.; Li, S.; Fang, L.; Kang, X.; Benediktsson, J.A. Hyperspectral image classification via shape-adaptive joint sparse representation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 9, 556–567. [Google Scholar] [CrossRef]
  25. Li, W.; Du, Q.; Zhang, F.; Hu, W. Hyperspectral image classification by fusing collaborative and sparse representations. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 4178–4187. [Google Scholar] [CrossRef]
  26. Hui, Z.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. 2005, 67, 768. [Google Scholar]
  27. Lorbert, A.; Eis, D.; Kostina, V.; Blei, D.; Ramadge, P. Exploiting covariate similarity in sparse regression via the pairwise elastic net. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 477–484. [Google Scholar]
  28. Chen, S.; Donoho, D. Basis pursuit. In Proceedings of the 1994 28th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 31 October–2 November 1994; Volume 1, pp. 41–44. [Google Scholar]
  29. Gill, P.R.; Wang, A.; Molnar, A. The in-crowd algorithm for fast basis pursuit denoising. IEEE Trans. Signal Process. 2011, 59, 4595–4605. [Google Scholar] [CrossRef]
  30. Mallat, S.G.; Zhang, Z. Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 1993, 41, 3397–3415. [Google Scholar] [CrossRef] [Green Version]
  31. Bioucas-Dias, J.M.; Plaza, A.; Dobigeon, N.; Parente, M.; Du, Q.; Gader, P.; Chanussot, J. Hyperspectral unmixing overview: Geometrical, statistical, and sparse regression-based approaches. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 5, 354–379. [Google Scholar] [CrossRef] [Green Version]
  32. Huang, S.; Zhang, H.; Pižurica, A. A robust sparse representation model for hyperspectral image classification. Sensors 2017, 17, 2087. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  33. Fang, L.; Li, S.; Kang, X.; Benediktsson, J.A. Spectral–spatial hyperspectral image classification via multiscale adaptive sparse representation. IEEE Trans. Geosci. Remote Sens. 2014, 52, 7738–7749. [Google Scholar] [CrossRef]
  34. Bandos, T.V.; Bruzzone, L.; Camps-Valls, G. Classification of hyperspectral images with regularized linear discriminant analysis. IEEE Trans. Geosci. Remote Sens. 2009, 47, 862–873. [Google Scholar] [CrossRef]
  35. Friedman, J.; Hastie, T.; Höfling, H.; Tibshirani, R. Pathwise coordinate optimization. Ann. Appl. Stat. 2007, 1, 302–332. [Google Scholar] [CrossRef] [Green Version]
  36. Ma, L.; Crawford, M.M.; Tian, J. Local manifold learning-based k-nearest-neighbor for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2010, 48, 4099–4109. [Google Scholar] [CrossRef]
  37. Tu, B.; Huang, S.; Fang, L.; Zhang, G.; Wang, J.; Zheng, B. Hyperspectral image classification via weighted joint nearest neighbor and sparse representation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 4063–4075. [Google Scholar] [CrossRef]
  38. Gualtieri, J.A.; Cromp, R. Support vector machines for hyperspectral remote sensing classification. In Proceedings of the 27th AIPR Workshop: Advances in Computer-Assisted Recognition, Washington, DC, USA, 14–16 October 1999; International Society for Optics and Photonics: Bellingham, WA, USA, 1999; Volume 3584, pp. 221–232. [Google Scholar]
  39. Ma, Y.; Zhang, Y.; Mei, X.; Dai, X.; Ma, J. Multifeature-Based Discriminative Label Consistent K-SVD for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 4995–5008. [Google Scholar] [CrossRef]
  40. Richards, J.A.; Richards, J. Remote Sensing Digital Image Analysis; Springer: Berlin/Heidelberg, Germany, 1999; Volume 3. [Google Scholar]
Figure 1. Indian Pines dataset. (a) composite color image. (b,c) ground truth.
Figure 1. Indian Pines dataset. (a) composite color image. (b,c) ground truth.
Entropy 23 00956 g001
Figure 2. Pavia Center dataset. (a) Composite color image. (b,c) Ground truth.
Figure 2. Pavia Center dataset. (a) Composite color image. (b,c) Ground truth.
Entropy 23 00956 g002
Figure 3. Pavia University dataset. (a) Ccomposite color image. (b,c) Ground truth.
Figure 3. Pavia University dataset. (a) Ccomposite color image. (b,c) Ground truth.
Entropy 23 00956 g003
Figure 4. Effects of the number of adaptive dictionary atoms K and balancing parameter λ . (a) Indian Pines dataset, (b) Pavia Center dataset and (c) Pavia University dataset.
Figure 4. Effects of the number of adaptive dictionary atoms K and balancing parameter λ . (a) Indian Pines dataset, (b) Pavia Center dataset and (c) Pavia University dataset.
Entropy 23 00956 g004
Figure 5. Classification performance for different numbers of training samples per class. (a) Indian Pines dataset, (b) Pavia Center dataset and (c) Pavia University dataset.
Figure 5. Classification performance for different numbers of training samples per class. (a) Indian Pines dataset, (b) Pavia Center dataset and (c) Pavia University dataset.
Entropy 23 00956 g005
Table 1. List of the number of samples involved in training and testing for each class in Indian Pines, Pavia Center and Pavia University datasets.
Table 1. List of the number of samples involved in training and testing for each class in Indian Pines, Pavia Center and Pavia University datasets.
Indian PinesPavia CenterPavia University
No.Name of ClassTraningTestingName of ClassTraningTestingName of ClassTraningTesting
1Corn-notill1421286Water1002500Asphalt100800
2Corn-mintill83747Trees1002500Meadows100800
3Grass-pasture49434Meadow1002500Gravel100800
4Grass-trees73657Self-Blocking Bricks1002500Trees100800
5Hay-windrowed48430Bare Soil1002500Painted mental sheets100800
6Soybean-notill98874Asphalt1002500Bare Soil100800
7Soybean-mintill2462209Bitumen1002500Bitumen100800
8Soybean-clean60533Tile1002500Self-Blocking Bricks100800
9Woods1271138Shadows1002500Shadows100800
Table 2. Classification results of Indian Pines by pixelwise algorithms (KNN, SRC, CRC, FRC, ENRC, NRS and PENRC) and spatial-based algorithms (SA-JSR, WJNN-JSR and J-PENRC). Bold indicates the best result.
Table 2. Classification results of Indian Pines by pixelwise algorithms (KNN, SRC, CRC, FRC, ENRC, NRS and PENRC) and spatial-based algorithms (SA-JSR, WJNN-JSR and J-PENRC). Bold indicates the best result.
No.KNNSRCCRCFRCENRCNRSPENRCSA-JSRWJNN-JSRJ-PENRC
158.4459.4866.4964.0357.69 88 . 99 78.4491.6094.16 96 . 75
254.6962.2863.3964.7367.38 89 . 60 78.1286.7592.50 97 . 10
395.0090.3896.5497.9693.0963.78 98 . 46 94.0199.77 100
496.7098.9796.7093.9199.4689.96 98 . 98 100 100 100
5 100 98.4499.2299.2298.1898.62 100 100 100 100
662.6865.3338.1052.8770.1670.30 75 . 62 93.5994.17 97 . 90
779.2078.2190.4290.6582.1169.57 90 . 79 95.2595.97 97 . 44
851.7251.7241.3852.6659.6058.45 78 . 68 91.1892.32 95 . 92
993.4194.0098.8395.4696.6098.05 99 . 56 97.8998.33 99 . 82
OA (%)75.5576.3178.0680.2579.1686.09 87 . 70 94.4096.00 98 . 05
AA (%)76.9077.6577.6579.8080.6280.93 88 . 57 94.4796.36 98 . 33
Kappa 71.2772.1472.1476.5475.5782.39 85 . 44 93.4295.31 97 . 71
Table 3. Classification results of Pavia Center by pixelwise algorithms (KNN, SRC, CRC, FRC, ENRC, NRS and PENRC) and spatial-based algorithms (SA-JSR, WJNN-JSR and J-PENRC). Bold indicates the best result.
Table 3. Classification results of Pavia Center by pixelwise algorithms (KNN, SRC, CRC, FRC, ENRC, NRS and PENRC) and spatial-based algorithms (SA-JSR, WJNN-JSR and J-PENRC). Bold indicates the best result.
No.KNNSRCCRCFRCENRCNRSPENRCSA-JSRWJNN-JSRJ-PENRC
199.1199.6799.6710099.81100100100100100
289.5676.3782.3384.1779.6791.8589.1793.6787.1194.67
387.8990.2188.0086.3192.3387.6799.6798.7295.5199.33
484.3379.3224.5687.4280.1776.2993.1699.8296.6097.31
588.8989.5067.5084.5089.6785.2696.0999.0081.8393.08
688.1177.8397.6779.6776.8597.3180.7368.6797.5299.41
786.4488.8186.1084.4388.2383.8394.2596.8385.1497.28
895.3397.0199.0397.2198.1599.5099.5095.0496.8399.60
910093.0082.3393.4295.5099.5094.6299.71100100
OA(%)91.0787.0680.1988.5688.9391.2494.1194.8392.9797.78
AA(%)91.0787.0680.1988.5688.9391.2494.1194.8392.9797.78
Kappa89.9686.4678.1587.1387.5490.5193.3894.1992.1497.50
Table 4. Classification results of Pavia University by pixelwise algorithms (KNN, SRC, CRC, FRC, ENRC, NRS and PENRC) and spatial-based algorithms (SA-JSR, WJNN-JSR and J-PENRC). Bold indicates the best result.
Table 4. Classification results of Pavia University by pixelwise algorithms (KNN, SRC, CRC, FRC, ENRC, NRS and PENRC) and spatial-based algorithms (SA-JSR, WJNN-JSR and J-PENRC). Bold indicates the best result.
No.KNNSRCCRCFRCENRCNRSPENRCSA-JSRWJNN-JSRJ-PENRC
170.8357.6736.0056.8360.6791.1772.0094.1670.0087.00
270.3378.0075.0080.1768.5071.0097.3392.5081.33100
369.6772.8392.6767.3373.3377.5097.0098.5982.0098.67
488.6789.5096.6794.3392.0095.3393.8310096.7397.13
598.5099.5010099.8399.2799.1799.6710099.83100
666.3365.1757.3364.0068.3383.0095.8394.1785.8397.00
785.5087.0092.1785.8387.0086.5099.1795.9795.0099.00
866.8367.8320.1772.0069.0064.5086.8392.3280.1794.33
910094.9593.3397.3398.1799.6797.8398.3399.83100
OA(%)79.6379.1473.7079.7479.5785.3193.2896.0087.8196.69
AA(%)79.6379.1473.7079.7479.5785.3193.2896.0087.8196.69
Kappa77.0876.3170.4277.2177.0283.4892.4495.3186.2996.17
Table 5. Computational complexity comparison in Indian Pines dataset.
Table 5. Computational complexity comparison in Indian Pines dataset.
With/Without LADRunning Time (s)Overall Accuracy
ENRC-32.5379.16
PENRC3472.6885.44
72.3587.70
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, H.; Zhang, Y.; Ma, Y.; Mei, X.; Zeng, S.; Li, Y. Pairwise Elastic Net Representation-Based Classification for Hyperspectral Image Classification. Entropy 2021, 23, 956. https://doi.org/10.3390/e23080956

AMA Style

Li H, Zhang Y, Ma Y, Mei X, Zeng S, Li Y. Pairwise Elastic Net Representation-Based Classification for Hyperspectral Image Classification. Entropy. 2021; 23(8):956. https://doi.org/10.3390/e23080956

Chicago/Turabian Style

Li, Hao, Yuanshu Zhang, Yong Ma, Xiaoguang Mei, Shan Zeng, and Yaqin Li. 2021. "Pairwise Elastic Net Representation-Based Classification for Hyperspectral Image Classification" Entropy 23, no. 8: 956. https://doi.org/10.3390/e23080956

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop