Abstract
Local tangent space alignment (LTSA) is a famous manifold learning algorithm, and many other manifold learning algorithms are developed based on LTSA. However, from the viewpoint of dimensionality reduction, LTSA is only a local feature preserving algorithm. What the community of dimensionality reduction is now pursuing are those algorithms capable of preserving both local and global features at the same time. In this paper, a new algorithm for dimensionality reduction, called HSIC-regularized LTSA (HSIC–LTSA), is proposed, in which a HSIC regularization term is added to the objective function of LTSA. HSIC is an acronym for Hilbert–Schmidt independence criterion and has been used in many applications of machine learning. However, HSIC has not been directly applied to dimensionality reduction so far, neither used as a regularization term to combine with other machine learning algorithms. Therefore, the proposed HSIC–LTSA is a new try for both HSIC and LTSA. In HSIC–LTSA, HSIC makes the high- and low-dimensional data statistically correlative as much as possible, while LTSA reduces the data dimension under the local homeomorphism-preserving criterion. The experimental results presented in this paper show that, on several commonly used datasets, HSIC–LTSA performs better than LTSA as well as some state-of-the-art local and global preserving algorithms.
References
van der Maaten LJP, Postma EO, van der Herik HJ (2007) Dimensionality reduction: a comparative review. J Mach Learn Res 10(1):66–71
Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Schölkopf B, Smola A, Müller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
Weinberger KQ, Sha F, Saul LK (2004) Learning a kernel matrix for nonlinear dimensionality reduction. In: Proceedings of the twenty-first international conference on Machine learning. ACM
Lafon S, Lee AB (2006) Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Trans Pattern Anal Mach Intell 28(9):1393–1403
Zhang Z, Zha H (2004) Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J Sci Comput 26(1):313–338
He X, Niyogi P (2003) Locality preserving projections. Adv Neural Inf Process Syst 16(1):186–197
Chen J, Ma Z, Liu Y (2013) Local coordinates alignment with global preservation for dimensionality reduction. IEEE Trans Neural Netw Learn Syst 24(1):106–117
Liu X et al (2014) Global and local structure preservation for feature selection. IEEE Trans Neural Netw Learn Syst 25(6):1083–1095
Gretton A et al (2005) Measuring statistical dependence with Hilbert–Schmidt norms. In: International conference on algorithmic learning theory. Springer, Berlin
Yan K, Kou L, Zhang D (2017) Learning domain-invariant subspace using domain features and independence maximization. IEEE Trans Cybern 48:288–299
Damodaran BB, Courty N, Lefèvre S (2017) Sparse Hilbert Schmidt independence criterion and surrogate-kernel-based feature selection for hyperspectral image classification. IEEE Trans Geosci Remote Sens 55(4):2385–2398
Gangeh MJ, Zarkoob H, Ghodsi A (2017) Fast and scalable feature selection for gene expression data using Hilbert–Schmidt independence criterion. IEEE ACM Trans Comput Biol Bioinform 14(1):167–181
Xiao M, Guo Y (2015) Feature space independent semi-supervised domain adaptation via kernel matching. IEEE Trans Pattern Anal Mach Intell 37(1):54–66
Zhong W et al (2010) Incorporating the loss function into discriminative clustering of structured outputs. IEEE Trans Neural Netw 21(10):1564–1575
Boothby WM (2007) An introduction to differentiable manifolds and Riemannian geometry. Elsevier (Singapore) Pte Ltd., Singapore
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Donoho DL, Grimes C (2003) Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc Natl Acad Sci 100(10):5591–5596
Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv Neural Inf Process Syst 14(6):585–591
He X, Yan S, Hu Y, Niyogi P, Zhang H (2005) Face recognition using Laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340
Pang Y, Zhang L, Liu Z, Yu N, Li H (2005) Neighborhood preserving projections (NPP): a novel linear dimension reduction method. Proc ICIC Pattern Anal Mach Intell 1:117–125
Cai D, He X, Han J, Zhang H (2006) Orthogonal Laplacianfaces for face recognition. IEEE Trans Image Process 15(11):3608–3614
Kokiopoulou E, Saad Y (2007) Orthogonal neighborhood preserving projections: a projection-based dimensionality reduction technique. IEEE Trans Pattern Anal Mach Intell 29(12):2143–2156
Yan S, Xu D, Zhang B, Zhang H, Yang Q, Lin S (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51
Saul LK, Roweis ST (2003) Think globally, fit locally: unsupervised learning of low dimensional manifold. J Mach Learn Res 4(1):119–155
Qiao H, Zhang P, Wang D, Zhang B (2013) An explicit nonlinear mapping for manifold learning. IEEE Trans Cybern 43(1):51–63
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7(1):2399–2434
Jost J (2008) Riemannian geometry and geometric analysis. Springer, Berlin
Spivak M (1981) A comprehensive introduction to differential geometry. In: American Mathematical Monthly, vol 4
Kreyszig E (1981) Introductory functional analysis with applications, New York, 1
Mika S et al (1999) Fisher discriminant analysis with kernels. In: Neural networks for signal processing IX
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Gohberg I, Goldberg S, Kaashoek MA (1990) Hilbert–Schmidt operators. In: Classes of linear operators, vol 1. Birkhäuser, Basel, pp 138–147
Xiang S et al (2011) Regression reformulations of LLE and LTSA with locally linear transformation. IEEE Trans Syst Man Cybern Part B (Cybern) 41(5):1250–1262
Martin Sagayam K, Jude Hemanth D (2018) ABC algorithm based optimization of 1-D hidden Markov model for hand gesture recognition application. Comput Ind 99:313–323
Sagayam KM, Hemanth DJ, Ramprasad YN, Menon R (2018) Optimization of hand motion recognition system based on 2D HMM approach using ABC algorithm. In: Hybrid intelligent techniques for pattern analysis and understanding. Chapman and Hall, New York
Sagayam KM, Hemanth DJ (2018) Comparative analysis of 1-D HMM and 2-D HMM for hand motion recognition applications. In: Progress in intelligent computing techniques: theory, practice, and applications, advances in intelligent systems and computing. Springer, p 518
Gangeh MJ, Ghodsi A, Kamel MS (2013) Kernelized supervised dictionary learning. IEEE Trans Signal Process 61(19):4753–4767
Gangeh MJ, Fewzee P, Ghodsi A, Kamel MS, Karray F (2014) Multiview supervised dictionary learning in speech emotion recognition. IEEE ACM Trans Audio Speech Lang Process 22(6):1056–1068
Barshan E et al (2011) Supervised PCA visualization classification and regression on subspaces and submanifolds. Pattern Recognit 44:1357–1371
Acknowledgements
We would like to express our sincere appreciation to the anonymous reviewers for their insightful comments, which have greatly aided us in improving the quality of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Reproducing kernel Hilbert Spaces (RKHS)
Appendix: Reproducing kernel Hilbert Spaces (RKHS)
HSIC is based on RKHS. Let \({{{ L}}^{{ 2}}}\left( \varOmega \right) =\left\{ f\left| f:\varOmega \rightarrow R,\int \limits _{\varOmega }{{{\left| f\left( x \right) \right| }^{2}}}<+\infty \right. \right\}\) be the space of square integrable functions. An inner product \(\left\langle \bullet ,\bullet \right\rangle\) can be defined over \({{{ L}}^{{ 2}}}\left( \varOmega \right)\) [30]:
It can be proven that \(H=\left( {{{ L}}^{{ 2}}}\left( \varOmega \right) ,\left\langle \bullet ,\bullet \right\rangle \right)\) is a complete inner product space, i.e., Hilbert space.
Definition
[30] Let \(H=\left( {{{ L}}^{{ 2}}}\left( \varOmega \right) ,\left\langle \bullet ,\bullet \right\rangle \right)\); if there is a function \(k:\varOmega \times \varOmega \rightarrow R\) such that
For all \(x\in \varOmega\),\({{k}_{x}}=k\left( \bullet ,x \right) \in H\);
For all \(f\in H\), \(f\left( x \right) =\left\langle f,k\left( \bullet ,x \right) \right\rangle\)
then H is called a reproducing kernel Hilbert space (RKHS) and k called the reproducing kernel of H.
The reproducing kernel k can used to define a map: \(\varphi :\varOmega \rightarrow H\) such that for all \(x\in \varOmega\),
It can be easily proven that
The above equation is often used in many kernel methods of machine learning such as kPCA [3], kLDA [31], kSVM [32], etc.
Furthermore, if X is a random variable on \(\varOmega\), then \(\varphi \left( X \right)\) is a random process and its mean function is defined
Then, for all \(f\in H\),
In mathematics, it can be proven that RKHS can be generated from kernel functions. The definition of kernel functions is as follows:
Definition
[33] Let \(k:\varOmega \times \varOmega \rightarrow R\), if k satisfies the following conditions:
Symmetric: for all \(x,y\in \varOmega\), \(k\left( x,y \right) =k\left( y,x \right)\)
Square integrable: for all \(x\in \varOmega\), \({{k}_{x}}=k\left( \bullet ,x \right)\) is square integrable
Positive definite: for all \({{x}_{1}},\ldots ,{{x}_{N}}\in \varOmega\), the matrix \(\left[ \begin{array}{lll} k\left( {{x}_{1}},{{x}_{1}} \right) &{} \ldots &{} k\left( {{x}_{1}},{{x}_{N}} \right) \\ \vdots &{} \ddots &{} \vdots \\ k\left( {{x}_{N}},{{x}_{1}} \right) &{} \ldots &{} k\left( {{x}_{N}},{{x}_{N}} \right) \\ \end{array} \right]\) is positive definite then k is called a kernel function.
Remark
Kernel functions and reproducing kernels are not the same concept. Kernel functions are defined on their own, while reproducing kernels are defined based on RKHS.
Theorem
[30] A kernel function k can be used to generate a unique RHHS\({{H}_{k}}\)such that k becomes the reproducing kernel of\({{H}_{k}}.\)
Based on this theorem, as long as a kernel function is determined, a RKHS is also determined too.
Rights and permissions
About this article
Cite this article
Zheng, X., Ma, Z. & Li, L. Local tangent space alignment based on Hilbert–Schmidt independence criterion regularization. Pattern Anal Applic 23, 855–868 (2020). https://doi.org/10.1007/s10044-019-00810-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-019-00810-6