Abstract

Thanks to the rapid development of hyperspectral sensors, hyperspectral videos (HSV) can now be collected with high temporal and spectral resolutions and utilized to handle invisible dynamic monitoring missions, such as chemical gas plume tracking. However, using such sequential large-scale data effectively is challenged, because the direct process of these data requires huge demands in terms of computational loads and memory. This paper presents a key-frame and target-detecting algorithm based on cumulative tensor CANDECOMP/PARAFAC (CP) factorization (CTCF) to select the frames where the target shows up, and a novel super-resolution (SR) method using sparse-based tensor Tucker factorization (STTF) is used to improve the spatial resolution. In the CTCF method, the HSV sequence is seen as cumulative tensors and the correlation of adjacent frames is exploited by applying CP tensor approximation. In the proposed STTF-based SR method, we consider the HSV frame as a third-order tensor; then, HSV frame super-resolution problem is transformed into estimations of the dictionaries along three dimensions and estimation of the core tensor. In order to promote sparse core tensors, a regularizer is incorporated to model the high spatial-spectral correlations. The estimations of the core tensor and the dictionaries along three dimensions are formulated as sparse-based Tucker factorizations of each HSV frame. Experimental results on real HSV data set demonstrate the superiority of the proposed CTCF and STTF algorithms over the comparative state-of-the-art target detection and SR approaches.

1. Introduction

Hyperspectral imaging has been one of the most popular research fields due to its ability of identifying the materials from very high spectral resolution and coverage. In the last decade, researchers focused on the processing and application of hyperspectral image (HSI), such as denoising [1, 2], feature extraction [3, 4], classification [511], detection [1214], and super-resolution (fusion) [1518]. In this section, researching of the latter two fields which are related to this paper will be briefly introduced.

Basically, target detection is a kind of binary classifier with the purpose of labeling every image pixel as a target or background. In HSIs, pixels with a significantly different spectral signature from their neighboring background pixels are defined as spectral anomalies. Anomaly detectors are statistical or pattern recognition methods used to detect distinct pixels that differ from the background. It is worth mentioning that, in spectral anomaly detection approaches [1922], such as Reed-Xiaoli (RX) algorithm [23], no prior information of the target spectral signature is assumed or used. However, we focus on the detection of invisible gas plumes in this paper, and the prior knowledge of the desired targets spectral characteristics is assumed to be known. In such cases, signature-based target detection algorithms are presented instead of anomaly detection. In these algorithms, the spectral characteristics of the target can be represented by a target subspace or a single target spectrum [24]. Likewise, the characteristics of background can be statistically expressed by a Gaussian distribution or a subspace defining the local or whole background statistics. As for this category, the matched subspace detector (MSD) method [25] is one of the most typical algorithms. In the MSD, the target pixel vectors are represented by a linear combination of the target spectral signature and the background spectral signature, which stand for the subspace target spectra and the subspace background spectra, respectively. Then, the generalized likelihood ratio test (GLRT) is applied, using projection matrices associated with the background subspace and the target-and-background subspace. At last, the comparison between the output of GLRT and a preset threshold makes a final decision about whether the target is absent or present. From pixel level to subpixel level, a single pixel may contain several distinct pure materials (endmembers), also known as the mixed pixel. The presence of mixed pixels is a tough problem caused by the low spatial resolution of HSIs. Accordingly, some unmixing approaches [2628] have been designed to compute fractional abundance of endmembers. In [29], a hyperspectral unmixing approach based on constrained matrix factorization (CMF) was proposed. Unlike conventional methods, each column vector of endmember matrix is represented as a nonnegative linear combination of pixel spectra. After endmember matrix and the corresponding fractional abundance matrix are obtained by solving optimization problems, abundance map of the target endmember shows the detection result.

As mentioned before, the HSIs often suffered from low spatial resolution. To acquire an HSI, the number of sun photons in each spectral band has to be greater than a minimum value, and the number of spectral bands is so huge in an HSI that the spatial resolution has to be sacrificed. Therefore, super-resolution (SR) techniques have aroused great interest in the last decade. Generally, the SR methods of HSI can be classified into four categories: Bayesian [30], component analysis [31], deep learning [32], and sparse representation. Due to the limited length of this paper, we focus on the introduction of sparse-based algorithms. In such HSI super-resolution schemes, images are expressed by dictionaries and corresponding sparse coefficients. On the basis of the spatial-spectral sparsity in the HSIs, the dictionaries and sparse coefficients are estimated jointly [33]. Huang et al. [34] introduced a fusion method of multispectral images (MSIs) with different spectral and spatial resolutions based on sparse matrix factorization. Akhtar et al. [35] presented an MSI-HSI fusion approach using sparse coding and Bayesian dictionary learning. Moreover, some algorithms based on matrix factorization [3638] or unmixing [39] can also be regarded as the sparse representation schemes because the source images are decomposed into some basis and the corresponding coefficients. Yokoya et al. proposed a couple nonnegative matrix factorization (CNMF) [40] algorithm, where the unmixing techniques are employed to yield the endmember matrices and the high-resolution (HR) abundance matrices of HSI. In [41], Lanaras et al. suggested a joint scheme to solve the spectral unmixing problems. In [42], Zhang et al. fused the low-resolution (LR) HSI and HR-MSI based on the group spectral embedding and low-rank factorization.

However, the matrix factorization based schemes cannot fully exploit the spatial-spectral correlations of the HSIs. It is believed that considering HSIs as tensors is better because an HSI can be naturally expressed as a third-order tensor. In this paper, a detection algorithm based on cumulative tensor CP factorization (CTCF) is proposed. The sequential HSV data is expressed as a four-dimensional (4D) cumulative tensor; factor matrices are obtained by decomposing original 4D tensor using CP factorization. When a new frame presents and is added to the time dimension of the original tensor, this 4D cumulative tensor is updated together with the factor matrices. Consequently, a CP tensor approximation of the new frame is computed by updated factor matrices and the fitness between the new frame and the approximation is calculated. After comparing the fitness to a preset threshold, we can make the decision that whether the new frame continues to be used to update the cumulative tensor or the new frame is the key-frame where the target presents. CTCF-based method exploits not only the spatial-spectral correlations of the HSIs by applying tensor model, but also the temporal correlation between adjacent frames of the HSV.

On the other hand, tensor-based analysis has also been widely used in HSI super-resolution [4345]. To the best of our knowledge, most of the SR algorithms enhance spatial resolution by fusing high-resolution MSI (HR-MSI) and low-resolution HSI (LR-HSI) from the same scene. Unfortunately, it is less practical in real application. In some situations, LR-HSI is the only data we have rather than both. In this paper, we suggest an SR algorithm using sparse-based tensor Tucker factorization (STTF). Inspired by the Tucker factorization and its related works, the HSV frames are represented as third-order tensors, which are approximated by the multiplication of the dictionaries along three dimensions (i.e., the dictionaries of the height mode, the width mode, and the spectral mode: they are named “three modes dictionaries” for short in the rest of this paper) and a core tensor. Then, the problem of SR is transformed into the estimations of the three modes dictionaries and estimation of the core tensor. Specifically, the spatial information is represented by the height mode dictionary and the width mode dictionary, the spectral information is represented by the spectral mode dictionary, and the correlations of the three modes dictionaries are modeled by the core tensor. HSIs are generally self-similar so that a sparse prior can be imposed on the core tensor; then, the estimations of the core tensor and three modes dictionaries are formulated as the STTF of the LR and HR HSV frames. In the iteration of STTF, core tensor and dictionaries are all updated and accurate estimates are yielded when convergence is achieved.

The remainder of this paper is organized as follows. Section 2 presents the materials and methods, including the basic notations and preliminaries of tensor and tensor factorization, the proposed CTCF approach for key-frame detection, and the proposed STTF method for key-frame super-resolution problem. In Section 3, experimental results on real HSV and the discussions are given. The paper is summarized in Section 4 with ideas for future work along the path presented here.

2. Materials and Methods

2.1. Tensor Notations and Preliminaries
2.1.1. Tensor Notations

In this paper, vectors are denoted by boldface lowercase letters , matrices are denoted by boldface capital letters , and tensors are denoted by bold Euler script letters . Generally, a tensor is a kind of multidimensional array, denoted by . Here, tensor is an Nth-order tensor and is the dimension of the nth mode. Obviously, vectors are first-order tensors and matrices are second-order tensors. We use to denote the mode-n fiber, which are vectors yielded from tensor by changing index with other indexes fixed. The mode-n unfolding matrix of tensor is generated by placing all the mode-n fibers in a matrix as columns, denoted by .

An important calculation between a tensor and a matrix is the n-mode product, which is defined aswhere and . The elements of are denoted by , so the elements of are computed by

Given the definition of n-mode product, we can obtain

For continuous multiplication of a tensor and matrices in distinct modes, the result is not affected by the multiplication order, described by

If the modes are equivalent, equation (4) is transformed into

Suppose that is a collection of matrices; we define tensor as

The matricization form of equation (6) is presented bywhere () and () are vectors yielded by arranging the mode-1 fibers of the tensors and . The Kronecker product is denoted by symbol “.

Moreover, given the tensor , represents the -norm which equals the number of nonzero elements of , denotes the -norm, and denotes the Frobenius norm.

The definition of rank-one tensor is introduced at last. The Nth-order tensor is rank-one if it can be written as the outer product of N vectors, i.e., . The symbol “” denotes the vector outer product [46].

2.1.2. Tensor Factorizations

CANDECOMP/PARAFAC (CP) factorization decomposes a tensor into a sum of component rank-one tensors [47]. For example, given a third-order tensor , we may formulate it aswhere R is a positive integer and , , and (). The element of tensor can be computed by

CP factorization is illustrated in Figure 1.

The factorization result can be expressed by factor matrices of three dimensions. Factor matrices refer to the combination of the vectors from the rank-one components; i.e.,

Following [48], the CP model can be concisely represented as

On the basis of factor matrices, the mode-n unfolding matrices () of can be represented aswhere the symbol “” denotes the Khatri-Rao product [49]. In this way, loss functions can be modeled as the approximation of the mode-n unfolding matrices; then the factor matrices of CP factorization can be obtained by solving the corresponding optimization problem.

Tucker factorization is another popular tensor decomposing approach [50]. It decomposes a tensor into a core tensor multiplied by a matrix along each mode. Thus, in the same case as above where , the factorization can be described aswhere , , and are factor matrices which can be regarded as the principal components in each mode. Therefore, Tucker factorization is a form of higher-order principal component analysis (PCA). Tensor is the core tensor and its elements stand for the correlation level between the different components. Similar to (11), the Tucker model can be concisely represented by . Elementwise equation (13) can be represented as

The Tucker factorization is illustrated in Figure 2.

2.2. The Proposed CTCF-Based Detection Method

In this subsection, the optimization problem of updating factor matrix is presented, followed with the proposed cumulative tensor CP factorization (CTCF) of third-order tensors. It is then extended to Nth-order tensors. The CTCF-based detection method is described in the end of this subsection with its flowchart shown in Figure 3.

2.2.1. CP Tensor Approximation by Factor Matrices

Similar to equation (12), the mode-n unfolding matrix of can be approximated by factor matrices; i.e.,where the factor matrices are obtained by CP factorization. The corresponding loss function is

The Alternating Least Squares (ALS) algorithm is often applied to obtain factor matrices by solving the following optimization problem:

When the tensor updates, the new tensor can be computed by the updated factor matrices which are given by equation (17).

2.2.2. CTCF of Third-Order Tensor

Generally, an image is a second-order tensor; then sequential images form a third-order tensor, i.e., a video, adding a temporal dimension on two spatial dimensions. When a new video frame presents and is added to the time dimension of the original tensor, it is defined as a three-dimensional (3D) cumulative tensor. With the number of new frames increasing, the 3D cumulative tensor updates frame by frame.

In conventional CP tensor approximation, whenever a new frame of image is added in the time dimension, ALS algorithm needs to be reused to approximate the new cumulative tensor, which is a time consuming process. In addition, the temporal correlation between neighboring frames is not exploited in the decomposition of the cumulative tensor. This paper proposes CTCF to update the CP factorization of original cumulative tensor, obtain the updated factor matrices, and approximate the new frame.

Given an original 3D cumulative tensor , the result of CP factorization is denoted by . When a new tensor is added in the time dimension, the updated cumulative tensor is , of which the CP factorization appears as . We focus on obtaining , , and by updating , , and .

The updating process is operated in an alternating way. Firstly, temporal dimensional factor matrix is computed while factor matrices and are fixed; i.e.,where is divided into two terms. For and are fixed as and , the first row of (18) will be minimized if . To minimize the second row, according to (12), the optimal solution of is , where the symbol “” denotes Moore–Penrose pseudoinverse of the matrix [51]. So, can be updated by adding which is represented by

Secondly, factor matrix is computed while factor matrices and are fixed. Similar to 16, the loss function of estimating is written as

Derive with respect to ; then, we have

To simplifyequation (21), denote and ; thus, when , we have . According to [47], can be rewritten as

For computing , we also divide and into two terms; i.e.,

Since are fixed as , the first term of equation (23) contains only the information of original tensor, which can be expressed byso,equation (23) is rewritten asHence, can be updated from using mode-1 unfolding matrix of and factor matrix mentioned above . Generally, is initialized by , which is a small front part of , and updated iteratively by (25). Analogously, the update process of can be represented by

The update of may be summarized as

Finally, the update of factor matrix may likewise be expressed bywhere and .

To make the process clearer, the proposed CTCF of third-order tensor is summarized by Algorithm 1.

Input: original 3D cumulative tensor new tensor
 Step 1: new tensor is added in the time dimension and is obtained
 Step 2: decompose by CP factorization
 Step 3: update by (19), with and are fixed
 Step 4: update by (27), with and are fixed
 Step 5: update by (28), with and are fixed
 Step 6: estimate by updated , and
Output: approximation of updated cumulative tensor
2.2.3. CTCF of Nth-Order Tensor

On the basis of Section 2.2.2, we try to extend CTCF to higher-order tensors. Suppose an N-dimensional cumulative tensor where the last dimension is temporal dimension. The CP factorization of is represented as . When a new tensor is added in the time dimension, the updated cumulative tensor is , of which the CP factorization is denoted by .

Similar to Section 2.2.2, temporal dimensional factor matrix is firstly updated with other matrices fixed. Like 17, the optimization problem of estimating is formulated by

We also separate original part from new added part; i.e.,

The original part is minimized by fixing the first factor matrix and the new part is updated by .

The updates of nontemporal dimensional factor matrices () may refer to the ones of factor matrices and in Section 2.2.2. The loss function of estimating is the same as 16. Let and introduce matrices and ; the update of may be summarized aswhere and .

2.2.4. CTCF-Based Detection Method

In HSV, the sequential data is expressed as a 4D cumulative tensor; the temporal dimension increases with new frames are added in. Whenever a new frame presents, the results of original cumulative tensor CP factorization are updated to obtain the factor matrices of the new cumulative tensor, and the CP tensor approximation of the newly added frame is obtained at the same time. If the target is absent, the CP tensor approximation will lead to a small error, since the background information is similar between adjacent frames. On the contrary, if the error is large, the target is likely to present. We define the fitness between the new frame and its approximation in 34. If the fitness is smaller than the threshold, the target is supposed to appear in the new frame. Otherwise, the new frame is added in the temporal dimension and used to update original cumulative tensor.

The original 4D cumulative tensor is denoted by ; denotes the frame number of initial video. The factor matrices of four dimensions are represented aswhere , , , and and denotes the number of component rank-one tensors in CP factorization. When a new frame is added in the temporal dimension of original 4D cumulative tensor, the 4D cumulative tensor is updated and denoted by . The factor matrices of are expressed bywhere , , , and . Based on Section 2.2.3, we estimate and obtain the approximation of and , where . Actually, it is the specific case when .

We define the fitness (, ) as

If the target does not appear, the approximation error is small and the result of fitness is large. Given a preset threshold , when , i.e., the fitness is larger than , we decide that the target is absent. Then, the nontarget frame is added in temporal dimension and the updated 4D cumulative tensor becomes the new original 4D cumulative tensor, which can be expressed as

If the target appears, the approximation error is large and the fitness is smaller than . The residual of and is the approximation of the target tensor; i.e.,

The target of each frame will be shown in 2D form by taking the maximum value of every spectrum. In this way, the proposed CTCF-based detection method can extract not only the key-frames where the target presents, but also the approximate region of target in every key-frame. The flowchart of the proposed method is shown in Figure 3. In Section 3, experiments on real HSV data are conducted and the proposed method is compared with some representative techniques.

2.3. The Proposed STTF-Based Super-Resolution Method

In Section 2.2, we present an approach to detect the frames where the target appears in HSV and the approximate region of the target. However, as discussed in Section 1, there has to be a tradeoff between spectral resolution and the spatial resolution in HSI imaging systems [52]. The spatial resolution is always low since high spectral resolution is required in HSIs and HSV. So, we are interested in improving the spatial resolution of targets after the detecting process. Instead of fusing HR-MSI and LR-HSI, we try to handle the target SR problem by what we have got, which is more practical in real cases.

2.3.1. Problem Formulation

In this subsection, HSIs are represented as 3D tensors with three indexes (), which stand for the height, width, and spectral modes. denotes the HR-HSI and the LR-HSI is denoted by , where and . The goal is to estimate from .

There are two significant characteristics of HR-HSIs [53]: the first one is that spectral vectors can be well approximated in low dimensional subspaces, and the second one is that HSIs are spatially self-similar. This means that sparsity exists in both spectral and spatial dimensions. Inspired by sparse representation [54], the low dimensionality in spectral domain gives the possibility to form a spectral mode dictionary with few nonzero atoms; the self-similarities in spatial domain guarantee the sparse representations of the height and width modes with spatial dictionaries and . In this way, the conventional Tucker factorization is transformed into the multiplication of the core tensor and three modes dictionaries. The factorization is illustrated in Figure 4. The HR-HSI is represented aswhere , , and . The variables , , and denote the atoms (i.e., the number of columns) of , , and , respectively. The core tensor contains the coefficients of over three modes dictionaries. We can see that 37 incorporates the information of separated modes into a unified framework.

The LR key-frame of HSV can be seen as the spatially downsampled version of HR-HSI , which is written aswhere and are downsampling matrices of the height and width modes. Substituting 37 into (38), is represented bywhere and denotes the downsampled dictionary of height and width modes. To recover , we focus on estimating the dictionaries , , and and the core tensor .

2.3.2. The Proposed STTF-Based SR Algorithm

Since is a downsampled version, recovering from is a typical inverse problem, which is badly ill-posed. So, some prior knowledge of is needed to regularize the super-resolution problem. In HSI processing, the spectral sparsity is a widespread regularizer applied to solve varieties of ill-posed problems [5558]. In such regularization, spectral vectors are linearly combined by a small quantity of different spectral signatures. However, these schemes only take advantage of the sparsity of the spectral domain. In the proposed algorithm, taking into account the HSI self-similarity, sparsity regularization is extended to the spatial domain by exploiting the sparse-based tensor Tucker factorization (STTF). In STTF, the HR-HSI performs a united sparse representation of the core tensor and three modes dictionaries.

On the basis of equation (39), the HSV frame super-resolution is formulated as a constrained least-squares optimization problem:where represents the Frobenius norm and denotes the number of maximum nonzero elements of . Because of the -norm constraint, equation (40) is nonconvex. To make the optimization processable, the -norm is replaced by the -norm and 40 is transformed into an unconstrained version:where is the parameter of sparse regularizer. Equation (41) is also nonconvex, and the solutions of , , and and are not unique. Nonetheless, if we focus on only one variable with other variables fixed, the objective function in equation (41) is convex. Inspired by [59, 60], equation (41) can be solved by proximal alternating optimization scheme, which is guaranteed to reach convergence in a particular situation. Concretely, , , , and are updated iteratively bywhere denotes the previous estimation in the last iteration and denotes a positive number. Equation (41) defines the object function . The optimizations of , , , and will be presented detailedly in the appendix. The conjugate gradient (CG) method [61] and the alternating direction method of multipliers (ADMM) [62] will be used in the optimizations.

2.3.3. Initialization of the Proposed Method

Since the optimization problem in (41) is nonconvex, the solution would result in poor local minima if we set the initialization carelessly. In this paper, we initialize the spatial dictionaries and from and dictionary-updates-cycles KSVD (DUC-KSVD) [63]; this method can promote sparse representations. Then, initialization of spectral dictionary is accomplished by simplex identification split augmented Lagrangian (SISAL) algorithm [64]; this approach can efficiently identify a minimum unit that contains the spectral vectors.

The proposed STTF-based SR algorithm is summarized in Algorithm 2.

Input: LR-HSI
 Initialize with SISAL
 Initialize and with DUC-KSVD
 Initialize with (39)
while no convergence do
  Step 1: update by solving (A.3) with CG
    
  Step 2: update by solving (A.6) with CG
    
  Step 3: update by solving (A.9) with CG
    
  Step 4: update by solving (A.15) with CG
    
  end while
 Estimate by (37)
Output: HR-HSI

3. Results and Discussion

3.1. Experimental Data Set

To highlight the advantages of HSIs, we choose invisible gas plume to be the target. The proposed algorithms can be extended to other types of data reasonably. In this section, the HSV data set is acquired by the infrared imaging spectrometer “HyperCam-LW.” Sulfur hexafluoride (SF6) is chosen to be the target, since it is a kind of odorless and colorless gas plume with a distinct absorption peak in LWIR range. The HSV data set consists of 60 infrared hyperspectral frames with the size of . The imaging interval is 4.8 s, and the wavelength of the data ranges from 7.8 μm to 11.8 μm.

In SR method, only the middle pixels are used in the experiment (specifically, column 71 to column 198) for reasons connected with the algorithm process. And we remove the spectral band 41–127 because of water vapor absorption and extremely low SNR. At last, the size of input LR-HSI is .

3.2. Compared Methods

For CTCF-based detection method, we compare it with two representative methods: MSD (matched subspace detector) [25] and CMF (constrained matrix factorization) [29]. For STTF-based SR method, we compare it with three state-of-the-art algorithms: bicubic interpolation, sparse representation-based SR method [54], and sequence information-based SR method [65].

3.3. Qualitative and Quantitative Metrics

For detection methods, receiver operating characteristic (ROC) curves [66] are used to evaluate the performance. Generally, a detector outperforms another one if the area under its ROC curve is larger [67]. As suggested in [68], the area under the ROC curve (AUC) is also calculated as a measure of performance of these detection methods. Usually, a better detector gets a higher AUC value.

For SR algorithms, since we directly process the LR-HSI, there is no original HR-HSI (i.e., the ground truth) for reference. Thus, some popular quantitative metrics are not available, such as RMSE (root-mean-square error) [69], PSNR (peak signal to noise ratio), and SAM (spectral angle mapper). In this section, entropy and average gradient are introduced to evaluate the performance of SR methods.

3.3.1. Entropy

Super-resolution aims to introduce more useful information into images, so we may measure the performance of SR methods by calculating the contained information in the experimental results. The entropy is indicated as

The probability of a pixel in the image is denoted by and denotes the grey value range . The larger the entropy value of the image, the richer the information contained in the image.

3.3.2. Average Gradient

Another assessment to measure the performance of super-resolution is the change of the amount of detailed information in the image. We may evaluate the experimental results by average gradient, since it can reflect the ability of expressing the details and measuring the clarity of the image. The gradient increases if the greyscale level rate in one direction of the image varies quickly. The average gradient is formulated aswhere and denote the height and width of the image, respectively; denotes the greyscale value of pixel in the image. The larger the average gradient value of the image is, the clearer the image will be.

Besides, the visual quality of output images is an important qualitative metric.

3.4. Parameters Setting

In MSD, we pick 463 spectrums of gas target and 846 spectrums of background from the 12th frame of HSV to build up the training set. The size of the target subspace and background space is and , respectively. In CMF, the number of endmembers is 3, the sparsity of factor matrices is 2, and number of iteration is 3. In the proposed CTCF-based method, the original cumulative tensor is obtained by ALS, the tensor rank is 3, the maximum iteration number is 100, and the reconstruction error is 10−8; in update stage, the threshold of fitness is 0.9. In the proposed STTF-based SR method, the number of iterations is 5; the parameter is the weight in (42) and we set ; parameter controls the sparsity of ; we set ; parameter is set by ; the size of is set by , , and . The parameters above are decided after sufficient number of experiments to make a balance between efficiency and stability.

3.5. Experimental Results and Discussion

In this subsection, we show the experimental results of the various methods for detection and super-resolution.

After processing the HSV by the proposed CTCF-based method, we compute the values of Frobenius norm of each frame, which are presented in Figure 5. It is obvious that target gas appears in the 12th frame and disappears in the 51st frame. Figure 6 compares the ROC curves of test methods on four frames in detail, and Figure 7 illustrates the general trends of ROC curves of MSD, CMF, and CTCF, respectively. As can be seen from Figures 6 and 7, the proposed CTCF-based detection algorithm outperforms the other two methods. The AUC values of three approaches are shown in Table 1. In each row, the bold value represents the highest AUC value. Although the AUC values of CMF in some frames are better, we can see that the AUC values of CMF in some other frames are very low (less than 0.98). On the contrast, all the results of CTCF lie in the range of 0.98 to 1. From the average value and the variance (the bold value represents the highest value), we can conclude that the proposed method is superior and more stable. The graphical results are illustrated in Figure 8.

The target of each key-frame is shown in 2D form (grey image) by taking the maximum value of every spectrum. To save the length of the paper, we choose 8 frames to show the comparison of three detectors, which are shown in Figure 9. The first row to the eighth row present the detection result of the chosen frame, of which the frame number is 15, 18, 22, 28, 31, 39, 48, and 50. The higher the greyscale of the pixel in the image is, the closer it is to the target. It is apparent that our method extracts more accurate targets.

Table 2 shows the entropy and average gradient of the key-frames by four SR algorithms. Since sequence-based method needs 5 LR frames to form 1 HR frame, the compared frame number is changed from range 12∼50 to range 14∼48. In each row, the bold values represent the highest entropy value and the highest average gradient value. From Table 2, we can conclude that firstly, although interpolation can add more information in the frame, the details of the target are lost; secondly, sparse representation SR and sequence information SR have almost the same entropy, but the latter approach offers more details because in the method the HR dictionary is formed by several LR dictionaries; finally, the proposed STTF-based SR method outperforms the other three methods in both metrics.

Figure 10 presents the visual quality of the results obtained by four test methods. We choose the 16th, 21st, 34th, and 47th frames as a representative. The smaller one with size of is the LR 2D-form frame. The bigger ones with size of are the SR results of different algorithms. As can be seen from Figure 10, the proposed approach yields clearer outputs with sharper edges and more textures. A drawback is the “checkerboard artifacts,” which may be caused by the deconvolution operations in the method. We desired to fix it in our future work.

4. Conclusions

In this paper, aiming at hyperspectral video, we propose a novel key-frame and target detection method based on cumulative tensor CP factorization, termed as CTCF, and a super-resolution algorithm based on sparse-based tensor Tucker factorization, called STTF. Unlike conventional matrix factorization based methods, CTCF considers hyperspectral video (HSV) as 4D cumulative tensor and approximates new added frames by updating factor matrices. To break the limit of conventional methods and make super-resolution (SR) more practical, STTF exploits the sparsity of HSV frames and factorizes them as a sparse core tensor multiplied by three modes dictionaries. In this way, spatial resolution of LR-HSI is enhanced directly without HR samples. The experimental results systematically prove that the proposed CTCF and STTF methods outperform other state-of-the-art algorithms.

In the future works, we focus on tensor factorization based target tracking methods which are able to extract target region more accurately and clearly. For super-resolution, we aim at exploiting nonlocal similarities in tensor factorization framework, which has been widely used in inverse problems. Besides target tracking and super-resolution, regions of interest (ROI) approaches will be investigated, in order to make HSV target recognition more efficient and full featured. Inspired by [70] and other related works, we believe that the researches of chemical gas detecting methods will benefit the agricultural application of HSI/HSV. These studies will be of great significance in internet of things (IoT), smart agriculture, pollution monitoring, etc.

Appendix

The optimizations of , , , and in Section 2.3.2 are presented as follows.(1)Optimization of : when , , and are fixed, the optimization of in (42) is represented aswhere denotes the previous estimation of height mode dictionary in last iteration. Using characteristics of n-mode product (see (3)), (A.1) is represented aswhere denotes the mode-1 unfolding matrix of and . Equation (A.2) is quadratic and can be solved by computing general Sylvester matrix equation; i.e.,

The conjugate gradient (CG) method is utilized to solve (A.3). After several iterations, CG will reach the convergence in certain conditions. In our experiments, it has been found that the solution of (A.3) is well approximated after 20 iterations.(2)Optimization of : when , , and are fixed, the optimization of in (42) is expressed bywhere denotes the previous estimation of width mode dictionary in last iteration. Similar to the optimization of , (A.4) can be transformed intowhere denotes the mode-2 unfolding matrix of and . Equation (A.5) is also quadratic and can be solved by computing general Sylvester matrix equation; i.e.,

Likewise, CG is used to solve (A.6).(3)Optimization of : when , , and are fixed, the optimization with respect to in (42) can be formulated aswhere denotes the previous estimation of spectral mode dictionary in last iteration. Same as the processing in the two subsections above, we havewhere denotes the mode-3 unfolding matrix of and . Similarly, (A.8) can be solved by computing general Sylvester matrix equation; i.e.,

We apply CG to solve (A.9) and the convergence is achieved in a few iterations.(4)Optimization of : when , , and are fixed, the optimization of in (42) can be written aswhere denotes the previous estimation of core tensor in last iteration. Equation (A.10) is convex, so we can employ the ADMM to solve the optimization problem. Introducing splitting variables and , (A.10) can be transformed into the equivalent constrained form:where

Equation (A.11) is a typical form of optimization problem that corresponds to the standard ADMM. The augmented Lagrangian function for (A.11) is represented aswhere denotes the Lagrangian multiplier and denotes the penalty parameter. The process of ADMM is formulated as

Here, the optimizations of and are independent because function is decoupled with respect to these variables. Next, (A.14) will be discussed more detailedly.(i)Update : based on (A.13), we haveand the closed-form solution of (A.15) iswhere .(ii)Update : based on (A.13), we have

Based on (6) and (7), (A.17) is equivalent towhere the vectors , , , and are the vectorization form of tensors , , , and , respectively, and matrix . Equation (A.18) has the closed-form solution which is denoted by

However, is so large that (A.19) is too heavy to be solved. We rewrite the first term of (A.19) as follows:where and () denote eigenvector matrices and eigenvalue matrices of , , and , respectively. So, is diagonal and can be computed easily. Moreover, the operation of and of is i-mode products and the multiplication in (A.20) is elementwise. Finally, in the second term of (A.19) can be computed by(iii)Update : based on (A.14), is updated by

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

The authors would like to thank Professor Gu from Heilongjiang Province Key Laboratory of Space-Air-Ground Integrated Intelligent Remote Sensing for his selfless help. This work was supported by the National Natural Science Foundation of China (Grant no. 61671184) and the National Natural Science Foundation of Key International Cooperation of China (Grant no. 61720106002).