Abstract

Intelligent internet data mining is an important application of AIoT (Artificial Intelligence of Things), and it is necessary to construct large training samples with the data from the internet, including images, videos, and other information. Among them, a hyperspectral database is also necessary for image processing and machine learning. The internet environment provides abundant hyperspectral data resources, but the hyperspectral data have no class labels and no so high value for applications. So, it is important to label the class information for these hyperspectral data through machine learning-based classification. In this paper, we present a quasiconformal mapping kernel machine learning-based intelligent hyperspectral data classification algorithm for internet-based hyperspectral data retrieval. The contributions include three points: the quasiconformal mapping-based multiple kernel learning network framework is proposed for hyperspectral data classification, the Mahalanobis distance kernel function is as the network nodes with the higher discriminative ability than Euclidean distance-based kernel function learning, and the objective function of measuring the class discriminative ability is proposed to seek the optimal parameters of the quasiconformal mapping projection. Experiments show that the proposed scheme is effective for hyperspectral image classification and retrieval.

1. Introduction

Intelligent data mining is an important issue of AIoT (Artificial Intelligence of Things), and with the development of machine learning, a large training dataset is necessary for the learning tasks, including images and videos. Among these applications, hyperspectral databases are also very necessary for hyperspectral image processing and machine learning. So, internet environment-based hyperspectral data retrieval is an important issue of AIoT, and it is also an effective way to create a large-scale hyperspectral training database for some applications. The internet environment provides abundant hyperspectral data resources but is included in other complex data. Moreover, the hyperspectral data have no class labels and no detail class knowledge information for these data. So, the hyperspectral data have no high value without class information. So, further intelligent hyperspectral data-based machine learning is necessary through internet-based data retrieval. Intelligent hyperspectral data retrieval under the internet environment combination is the application of AI (Artificial Intelligence) and IoT (Internet of Things).

Hyperspectral data-based machine learning is a feasible and effective method to extract the features for image retrieval. The machine learning methods are divided into unsupervised and supervised learning. The unsupervised learning includes multidimensional scaling, NMF, ICA, neighborhood preserving embedding, Locality Preserving Projection (LPP) [1], and other computing methods [2], and for the supervised learning, generalized discriminant analysis [3], uncorrelated discriminant vector analysis [4], and some acceleration algorithm [5, 6]. In recent years, the kernel-based machine learning algorithms were presented for the feature extraction; this paper proposes an improved kernel function supervised kernel-based LPP, local structure supervised feature extraction [7], kernel subspace LDA [8], kernel MSE [9], and quasiconformal mapping-based kernel machine [10]. Many kernel learning methods are proposed to improve the accuracy performance for the practical kernel learning system, for example, sparse multiple kernel learning [11], large scale multiple kernel learning [12], and Lp-norm multiple kernel learning [13]. With the development of the deep learning theory, the framework of the deep learning-based multikernel machine is an effective framework, and the learning method has been widely used in image analysis [14, 15], image annotation [16], image classification [17], image segmentation [18], and anomaly detection [19]. Researchers proposed deep kernel learning, namely, LMKL [20], and in the other work, the estimated value of missing an error is adjusted instead of the double objective function [21]. And recent research includes machine learning-based image processing [22, 23] under the internet environment for the application of AI and IoT. Mathematically, it has been proved that multilayer can improve the richness of representation, and the researchers combine the support vector machine and multiple classifiers and use an adaptive back propagation algorithm to update coefficients and weights [20, 26, 27].

The kernel-based machine learning on the hyperspectral data is proposed to retrieve the spectral data in the internet environment. In the algorithm, we proposed the quasiconformal mapping-based kernel learning for hyperspectral data classification for data retrieval in the internet environment. And the Mahalanobis distance kernel function is applied to extract the nonlinear feature, with higher discriminative ability than Euclidean distance-based kernel function learning. The objective function of quasiconformal kernel learning created with the Fisher criterion is proposed to seek the optimal parameters of the quasiconformal mapping projection. The proposed scheme is effective to hyperspectral image retrieval under the internet environment.

2. Proposed Algorithm

2.1. Motivation and Framework

Intelligent hyperspectral retrieval extracts the features of the spectrum through sensing data processing and analysis. Motivated by the fact that kernel machine-based spectrum learning is effective to the nonlinear classification, we present a framework of quasiconformal mapping-based multiple kernel learning with Mahalanobis distance kernel functions. The contributions include three points: the quasiconformal mapping-based multiple kernel learning network framework is proposed for hyperspectral data classification; the Mahalanobis distance kernel function is as the network nodes with the higher discriminative ability than Euclidean distance-based kernel function learning; and the objective function of measuring the class discriminative ability is proposed to seek the optimal parameters of the quasiconformal mapping projection. The performance is improved with two facts: one is to optimize the data structure in the kernel empirical space with quasiconformal mapping and second is to improve the discriminant ability with Mahalanobis distance-based kernel. The proposed algorithm has the highly effective performance on characterizing the data through solving complex visual learning tasks. The learning framework of hyperspectral image classification is presented in Figure 1.

2.2. Quasiconformal Kernel Mapping Learning

Kernel-based learning is used in data classification, with “empirical kernel map.” Suppose that matrix , is decomposed aswhere is a diagonal matrix with positive eigenvalues of , . The mapping is the empirical kernel map as

Different kernels have different abilities on classification, and the kernel function based on feature similarity refers that features exist in the form of distance, that is, similarity in a function expression. For sample features , if the Euclidean distance between the two is , the general form of such a kernel function is

The RBF kernel is the most typical representative of this kind of kernel function, in addition to the negative distance kernel, the logarithmic function kernel, and the Bn spline kernel. For this type of kernel function, the Euclidean distance can be simply replaced with the Mahalanobis distance , that is,

In particular, the expression for a typical Mahalanobis distance RBF kernel function is

Similar to the above expression, given the inner product of two features, the general form of the kernel function based on the feature inner product is

The polynomial kernel is the most typical representative of this kind of kernel function, in addition to the Sigmoid kernel. For this type of kernel function, a transformation is required when expanding to the Mahalanobis distance kernel function. Considering that in the case of Euclidean distance, the inner product of the two features satisfies

Therefore, the Mahalanobis distance of the inner product can be written as

In this way, the Mahalanobis distance of the inner product can be obtained by calculating the Mahalanobis distance between the feature and the origin 0 and the feature. Similarly, the Mahalanobis distance of the inner product can be obtained:

In particular, the expression for a typical Mahalanobis distance polynomial kernel function is

In this paper, we introduce the quasiconformal kernel, , as follows:where , are the sample vectors and iswhere , is a free parameter, is called the “expansion vectors (XVs)”, is the number of XVs, and () is the “expansion coefficients” associated with ().

On the multiple kernels, the quasiconformal mapping kernel is described aswhere , is the th basic kernel, is the number of basic kernels for combination, is the weight for the th basic kernel function, and is the factor function defined bywhere , are selected by the training samples, and is the coefficient for the combination. satisfies the Mercer condition, is rewritten as of optimized transformation , and is the linear combination of kernels.

Given , , and the optimized multiple kernels , are to be optimized for the classification task. Finally, the joint convex formulation can be formed as

measures the class discriminative ability. can be solved in two stages, respectively. In the first stage, the centered kernel alignment [28] is applied to define the objective optimization function; in the second stage, Fisher-based and Margin-based optimization function is used to solve .

Step 1. Optimize the weight vector of kernels.
In multikernel learning, the crucial step is to select the adaptive weights of multiple kernels. The weight is to solve with centered kernel alignment [22]:where is the objective function,is the optimal kernel, and is the matrix trace. is the centered kernel matrix of , is the identity matrix, and is a vector with all entries equal to 1. Accordingly, is the center of . The objective function , is the centered kernel matrix of , where is the Frobenius norm between two matrices, i.e., . Suppose , is solved with the quadratic programming (QP) problem:where , , and is defined by . Based on this centered kernel alignment, the optimization problem can be transformed into a QP problem, which can be effectively solved with OPTI toolbox [28].

Step 2. Optimize the coefficients of kernels.
The coefficients is solved based on the Fisher criterion and is defined asThen, , where is solved with the eigenvalue problem of and . The matrix maybe is not symmetrical or the matrix is singular; is solved aswhere is the learning rate, is the learning rate of the th iteration, is the initial learning rate, and is the number of the total iterations.

2.3. Metric Similarity-Based Learning
2.3.1. Similar/Dissimilar Function

Suppose the initial distance metric function (generally using Euclidean distance), the purpose of metric learning is to construct a new distance based on some prior information, which is more consistent with the description of the sample features than the initial distance metric. In order to achieve this, the new metric function can be converted into , that is, by defining a mapping and using the original distance metric function to calculate a new metric function, thereby converting the metric learning problem into a learning mapping function problem.

Given a sample set , is a sample in the sample set; if a function defined in the vector space satisfies the following properties, then is called a distance measure function: symmetry: , nonnegative: , distinguishability: , and triangle inequality: . In the metric function that satisfies the above properties, the Euclidean distance is the most common distance metric function, which measures the absolute distance between spatial sample points, which is defined as

The Euclidean distance is characterized by simple calculation, but since the absolute distance measured is directly related to the coordinates of the position of each point, the adaptability to the data is poor in terms of feature scale and coupling degree between features. Cosine similarity mainly measures the consistency of direction between two vectors, which is defined as

Compared with the Euclidean distance, the cosine similarity measures the angle between the space vectors, which reflects the difference in the direction of the vector and is insensitive to the absolute value.

The Minkowski distance is a general expression of a class of distance functions. Given two -dimensional vectors: and , the Minkowski distance is defined aswhere takes a different value; the distance is derived as a different type of distance. When , the distance is called Manhattan distance, that is, . This distance is used to indicate the absolute wheelbase sum of the two points on the standard coordinate system. When , the distance is called the Euclidean distance. When , the Minkowski distance is called the Chebyshev distance, that is, , which represents the maximum value of the numerical difference between the coordinates. Similar to the Euclidean distance, the Minkowski distance is still related to the dimension of the feature, and the correlation between the features is not considered.

The Mahalanobis distance was proposed by P.C. Mahalanobis. It is defined aswhere is the data correlation matrix and is the Mahalanobis matrix. When is a unit array, the Mahalanobis distance degenerates into a Euclidean distance, indicating that the Euclidean distance is a special case of the Mahalanobis distance. The greatest advantage of the Mahalanobis distance is the ability to remove the coupling between various features and is scale-invariant.

As can be seen from the above definition, the traditional distance measurement function is fixed in form and can be calculated directly according to the formula without a process of “learning” to the sample. Obviously, this approach does not meet the diverse task requirements nor does it make full use of the information contained in the sample features. Therefore, it is necessary to construct a suitable distance metric function by learning the multidimensional information provided by the sample features for specific problems, so as to provide the best expression of feature similarity.

2.3.2. Distance Measure Function

Linear metric learning refers to obtaining a new metric function by linear transformation; that is, the form of the mapping function is , where is a projection matrix. The purpose of learning is to be able to find a suitable matrix . In the case where the initial distance measure is Euclidean distance, the new measure function is

At this time, for the real matrix , if , is a semipositive symmetric matrix. The formula can be written as follows:

It can be seen from the above equation that under the effect of matrix , the sample is linearly mapped to a new feature space. When is a unit matrix, the distance is the Euclidean distance. When is a diagonal matrix, it means that the original space is scaled, which is equivalent to weighting the samples. When is an orthogonal matrix, it means that the original space is rotated and transformed. When is a normal square matrix, it means that the original space is simultaneously scaled and rotated. When is not a square matrix, it is equivalent to a dimensionality reduction operation in addition to rotation and scale transformation.

The above formula is similar to the Mahalanobis distance metric in form, but the traditional Mahalanobis distance metric uses the inverse of the covariance matrix as the Mahalanobis matrix. in the above equation expands to the semipositive array, representing a more general Mahalanobis matrix, so linear metric learning is also known as Mahalanobis metric learning. From the perspective of using sample information, the traditional Mahalanobis matrix only uses the internal structure of the data, focusing on describing the distribution properties of the data, while the Mahalanobis matrix obtained by the metric learning makes full use of the relationship between the feature and the category label and focuses on features that adequately reflect sample class differences to achieve a better metric function. In general, a metric learning problem can be transformed into a constrained optimization problem:where represents the loss function on the training set ; is a regularization term, which is used to correct the overfitting; and is the preset regularization factor, which is used to adjust the influence degree of the regularization term in the training process. is the constraint on the training set. Different learning algorithms can be derived depending on the difference of loss function, the regularization term, and the constraints.

2.4. Similar/Dissimilar Learning Criterions

Two popular criterions, Fisher criterion and large margin nearest neighbor criterion, are used in similar-/dissimilar-based learning.

2.4.1. Fisher Criterion

The method is based on the pairwise constraint information provided by the sample as a priori information to minimize the similar sample pairs and control the distance between the nonsimilar sample pairs as the basic idea to construct a convex optimization problem and achieve the purpose of learning the Mahalanobis distance matrix. First, given a sample set , depending on whether the sample pairs belong to the same category, two constraint sets can be obtained: a homogeneous constraint set and a nonlike constraint set . If the categories of the sample pairs are the same (similar), then the sample pair belongs to the set , and if the categories of the sample pairs are not the same (similar), the sample pair belongs to the set . If the Euclidean distance is used as the initial distance metric, considering that the postlearning metric can make the distance between the pairs of similar samples as small as possible, a convex optimization problem can be constructed:where and are semipositive definite matrices. The constraint is added mainly to remove the trivial solution of . In the specific solution, an iterative update method can be used to solve. In each iteration, the mature Newton downhill method is used to perform the gradient descent process to obtain an updated Mahalanobis matrix. The matrix is then iteratively mapped onto the constraint set. Although the algorithm is relatively simple in implementation, the corresponding calculation amount is large in the case of large data size because the algorithm needs to construct all pairs of similar samples and nonsimilar samples in the whole dataset. And the convergence speed of the algorithm is also slow.

First, by introducing the projection matrix , the distance between the pair of points becomes

Considering the constraint set , after the action of the projection matrix , the sum of the squares of the distances between all pairs of points is

The sum of the squares of the distances between all pairs of points in constraint set can be calculated as

Considering an excellent projection matrix , it should maximize the distance between the samples in the constraint set and reduce the distance between the samples in the constraint set . Therefore, an objective function can be constructed by using the ratio of and to get an optimization problem; the optimal for the solution is

Thus, calculating the Mahalanobis distance matrix is after learning.

2.5. Procedural Steps and Discussion

The procedure is shown in Figure 2. The procedure includes three stages: the multikernel optimization, training, and testing. The first step is optimizing the weight vector of multiple quasiconformal kernels with Fisher and maximum margin criterions; the second step is optimizing multiple kernels for the classification. And the final step is classifier design with the optimal kernels.

On learning bounds of the proposed algorithm, the learning criterion is to maximize the accuracy of the test data. Given the function , the threshold version of is defined aswhere the kernel-based classifiers are the threshold of kernel expansions of the form, and the bounded norm is

For any with a probability at least over the data , each function has no more than , where is the expectation over . So, andwhere is the largest eigenvalue of .

3. Experiments and Analysis

3.1. Experiment Setting

The performance of the proposed intelligent hyperspectral instrument is evaluated. The accuracy of spectrum classification is an important index to evaluate the performance of spectrum classification. The experiment was carried out on two sensing datasets of a hyperspectral imager, i.e., Indian Pines dataset and Pavia University dataset. The dataset of Indian Pines is based on an airborne platform, under the various spectral and the spatial resolutions. The data includes 224 0.4-2.5 μm bands. Nine kinds of -pixel images are realized in the experiment. The data collection at the University of Pavia is based on a reflective optics system imaging spectrometer (ROSIS). The data includes 115 bands. In the experiment, the performance of 9 kinds of images is verified. Except for the feature dimensions of the participating categories, the rest of the two experiments were identical. In the use of classification features, considering the computational efficiency and stability of the Mahalanobis matrix, the dimension of original spectral features is reduced by PCA. After the dimension reduction, the features are normalized to eliminate the deviation caused by the sampling method. For the first experiment, the top 30 principal components are selected to participate in the classification; that is, the feature dimension was 30. For experiment 2, the first 40 principal components are selected to participate in the classification; that is, the feature dimension is 40. On the classifier settings, the preset parameter values are selected by cross-validation by a standard multiclass SVM. In the kernel function setting, the Gaussian kernel function and the Mahalanobis Gaussian kernel function are used as the basis kernel functions, respectively. The scale parameter is set between [0.01, 2], and the number of basis kernels is 10. In terms of the evaluation index, the overall classification accuracy (OA) and Kappa coefficient (KC) are used as performance evaluation indicators, and information such as classifier training time, test time, and support vector number is collected. In the comparison method, the average multikernel and different multiple kernel learning methods are used as the multikernel combination coefficient algorithm, and the Euclidean distance Gaussian kernel and the Mahalanobis Gaussian kernel are, respectively, used for comparison. In these experiments, we implement four algorithms as follows.

The Indian Pines dataset is collected under various spectral and spatial resolutions. The spectral curves denote the different remote sensing environments with an airborne platform. The data cube has 224 bands of spectral resolution through 0.4-2.5 μm range, and it has the spatial resolution of 20 m per pixel. We removed the noisy and water vapor absorption bands, and 200 bands of images are used in the experiments. The whole scene consists of pixels and 16 classes of interested objects with the size ranging from 20 to 2468 pixels; 9 classes are used in the experiments. One example is shown in Figure 3.

Pavia University data was acquired by the reflective optics system imaging spectrometer (ROSIS) over the urban area of the University of Pavia, Northern Italy. The dataset consists of 115 spectral bands and pixels with the spatial resolution of 1.3 m by pixel. Several undesirable bands influenced by the atmospheric absorption are discarded, leaving 103 bands in the 0.43–0.86 lm region. We cut a patch sized , consisting of 9 classes of land covers from the set. The example is shown in Figure 4.

3.2. Experiments on the Performance on Quasiconformal Kernel Mapping

In the experiments, we evaluate the performance on quasiconformal kernel mapping on the two databases. The performance of quasiconformal kernel mapping is testified and evaluated with the polynomial kernel and Gaussian kernel. We have the Kernel Sparse Representation Classifier (KSRC) and Support Vector Classifier (SVC) for classification. For comparisons, we also implement other algorithms, including SVM [29], RMKL-SVM [30], and POL-KSRC [31]. The experimental results on two datasets are shown in Tables 1 and 2.

3.3. Experiments on the Performance on Mahalanobis Distance Kernel

In the comparison method, the average multikernel and different multiple kernel learning methods are used as the multikernel combination coefficient algorithm, and the Euclidean distance Gaussian kernel and the Mahalanobis Gaussian kernel are, respectively, used for comparison. In these experiments, we implement four algorithms as follows: Euclidean-MKL1 [32]: the Euclidean distance kernel function. Each kernel function is combined according to the same weight; that is, the combination coefficient of each kernel function is the reciprocal of the number of kernel functions (see the description of literature [32] for details). Mahalanobis-MKL1: the Mahalanobis distance kernel function is used for kernel learning. Euclidean-MKL2 [33]: the Euclidean distance kernel function is used, which describes the combination coefficient as described in [33]. Mahalanobis-MKL2: the Mahalanobis distance kernel function is used, and the kernel learning is the same as Euclidean-MKL2.

The experimental results of different methods on the Indian Pines dataset are shown in Figure 5 and Tables 3 and 4. The experimental results of different methods on the Pavia University dataset are shown in Figure 6 and Tables 5 and 6. As these results, the proposed Mahalanobis distance kernel has the highest performance on the accuracy.

3.4. Experiment Comparisons

For the comparisons, we have some experiments to compare the performance of the proposed algorithm, and the following 14 methods are implemented as the comparison: (1) RBF: RBF Euclidean kernel as the kernel function in the SVM [31]; (2) Poly: polynomial Euclidean kernel as the kernel function in the SVM [31]; (3) Mahal-RBF: Mahalanobis distance-based RBF kernel as the kernel function in the SVM [34]; (4) Mahal-Poly: Mahalanobis distance-based polynomial kernel as the kernel function in the SVM [34]; (5) SK-CV (RBF): a SVM with a single kernel and adopting the RBF kernel as the kernel function in the SVM [35]; (6) SK-Poly: standard SVM with a single kernel and adopting a polynomial kernel as the kernel function in the SVM [35]; (7) NMF-MKL: the nonnegative matrix factorization (NMF) MKL proposed by Gu et al. [28], which combines multiple kernels with NMF; (8) KNMF-MKL: the kernel-based nonnegative matrix factorization (KNMF) MKL method, also proposed by Gu et al., which combines multiple kernels with the KNMF method; (9) Euclidean-MKL1 [32]: the Euclidean distance kernel function. Each kernel function is combined according to the same weight; that is, the combination coefficient of each kernel function is the reciprocal of the number of kernel functions (see the description of literature for details). (10) Euclidean-MKL2 [33]: the Euclidean distance kernel function is used, which describes the combination coefficient as described in [33]; (11) Mahalanobis-MKL1: proposed the Mahalanobis distance-based multiple kernel function, and learning criterions are same as Euclidean-MKL1; (12) Mahalanobis-MKL2: the Mahalanobis distance kernel function is used, and learning criterions are the same as Euclidean-MKL2; (13) Mahalanobis-QMKL1: proposed the Mahalanobis distance-based multiple quasiconformal kernel function, and learning criterions are the same as Euclidean-MKL1; (14) Mahalanobis-QMKL2: the Mahalanobis distance quasiconformal kernel function is used, and learning criterions are the same as Euclidean-MKL2.

As shown in Table 7, the proposed scheme is effective to the hyperspectral image classification. The quasiconformal mapping-based multiple kernel learning network framework is effective and feasible for hyperspectral data classification, and the Mahalanobis distance kernel function is as the network nodes with the higher discriminative ability than Euclidean distance-based kernel function learning, and the objective function of measuring the class discriminative ability is proposed to seek the optimal parameters of the quasiconformal mapping projection. Compared with other kernel-based learning methods, the proposed algorithm is effective and performs best.

3.5. Computation Efficiency and Practical Applications

In the experiments, the computational cost was recorded with a PC with a 2.6 GHz i5-3320 processor and 4 GB RAM. The different computational costs are achieved under the different features. The proposed algorithms omit the parameter optimization process under the same dimension of feature vector, so the high computation efficiency is achieved. Both Euclidean-based method and Mahalanobis-based method adopt nonnegative matrix factorization to optimize the kernel weights, and a higher dimension of features requires more time because it needs more memory to save the kernel matrix and has more dimensions to compute. So, Mahalanobis-based kernel learning has the higher computation efficiency than Euclidean-based method under the same dimension of the features.

In the practical application system, the system framework is shown in Figure 7. For the quasiconformal mapping kernel machine learning-based intelligent hyperspectral data retrieval under internet environment, the framework includes three stages: image collection, image processing online, and image transmission. In the framework, the second stage is important. Different from the experiment setting, the practical application system includes the image collection and image transmission.

4. Conclusion

In this paper, we present the quasiconformal mapping kernel machine learning-based intelligent hyperspectral data classification algorithm for the internet-based hyperspectral data retrieval and with the wide application in intelligent internet data mining as the application of AIoT (Artificial Intelligence of Things). The contributions of the algorithm lie in the following points: the quasiconformal mapping-based multiple kernel learning network framework is proposed for hyperspectral data classification, and the Mahalanobis distance kernel function is as the network nodes with the higher discriminative ability than Euclidean distance-based kernel function learning, and the objective function of measuring the class discriminative ability is proposed to seek the optimal parameters of the quasiconformal mapping projection. Experiments show that the proposed scheme is effective to the hyperspectral image classification. The proposed algorithm has advantages on the large training sample construction with the data from the internet, including images, videos, and other information.

Data Availability

We have not used specific data from other sources for the simulations of the results. The two popular hyperspectral datasets in this paper, Indian Pines dataset and Pavia University data, are downloaded free from the website: http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes. The proposed algorithm is implemented in Python.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

We would like to thank Dr. Li Li, Prof. Junbao Li, and Prof. Yanfeng Gu for providing the kernel-based learning programs of their papers for comparison in the experiments. This work is supported by the National Science Foundation of China under Grant No. 61871142.