Abstract
A growing number of problems in data analysis and classification involve data that are non-Euclidean. For such problems, a naive application of vector space analysis algorithms will produce results that depend on the choice of local coordinates used to parametrize the data. At the same time, many data analysis and classification problems eventually reduce to an optimization, in which the criteria being minimized can be interpreted as the distortion associated with a mapping between two curved spaces. Exploiting this distortion minimizing perspective, we first show that manifold learning problems involving non-Euclidean data can be naturally framed as seeking a mapping between two Riemannian manifolds that is closest to being an isometry. A family of coordinate-invariant first-order distortion measures is then proposed that measure the proximity of the mapping to an isometry, and applied to manifold learning for non-Euclidean data sets. Case studies ranging from synthetic data to human mass-shape data demonstrate the many performance advantages of our Riemannian distortion minimization framework.
Similar content being viewed by others
Notes
A useful analogy is the problem of making two-dimensional Cartesian maps of the earth: given a set of data points sampled from the earth’s surface, a two-dimensional surface—in this case a sphere—is first fitted to these points, and a Cartesian map of the sphere that best preserves distances and angles is then sought.
Recall the spectral norm of a square matrix A is the positive square root of the maximum eigenvalue of \(A^{\top } A\). It can also be verified that if \(\lambda _i\) is an eigenvalue of \(J^{\top } H J G^{-1}\), then \(\lambda _i-1\) is an eigenvalue of \(J^{\top } H J G^{-1}-I\).
The kernel function defined on Riemannian manifolds as in (11) is known to not be positive-definite in general (Jayasumana et al. 2015; Feragen et al. 2015). However, for our manifold learning purposes that mainly target to capture only the submanifold on which the data points lie, we do not require the positive-definiteness of the kernel.
Such a choice for weights is based on the approximation \({\tilde{d}}_i \approx c'\frac{\sqrt{\det G}}{\rho }(x_i)\) for a constant \(c'>0\), where \(\rho : \mathcal {M} \rightarrow {\mathbb {R}}\) is the underlying probability density generating data \(x_i\), satisfying \(\rho (x)\ge 0\) for all \(x \in {\mathbb {R}}^m\) and \(\int _\mathcal {M} \rho (x) \ dx = 1\). We refer the reader to equation (A.1.27) in Appendix A.1 of Jang (2019) for this approximation.
References
Barahona S, Gual-Arnau X, Ibáñez MV, Simó A (2018) Unsupervised classification of children’s bodies using currents. Adv Data Anal Classif 12(2):365–397
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434
Boothby WM (1986) An introduction to differentiable manifolds and Riemannian geometry, vol 120. Academic press, Cambridge
Bronstein MM, Bruna J, LeCun Y, Szlam A, Vandergheynst P (2017) Geometric deep learning: going beyond euclidean data. IEEE Signal Process Magazine 34(4):18–42
Coifman RR, Lafon S (2006) Diffusion maps. Appl Comput Harmonic Anal 21(1):5–30
Desbrun M, Meyer M, Alliez P (2002) Intrinsic parameterizations of surface meshes. Comput Graph Forum Wiley Online Libr 21:209–218
Donoho DL, Grimes C (2003) Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proc Natl Acad Sci 100(10):5591–5596
Dubrovin BA, Fomenko AT, Novikov SP (1992) Modern geometry-methods and applications Part I. The geometry of surfaces, transformation groups, and fields. Springer, Berlin
Eells J, Lemaire L (1978) A report on harmonic maps. Bull London Math Soc 10(1):1–68
Eells J, Lemaire L (1988) Another report on harmonic maps. Bull London Math Soc 20(5):385–524
Eells J, Sampson JH (1964) Harmonic mappings of Riemannian manifolds. Am J Math 86(1):109–160
Feragen A, Lauze F, Hauberg S (2015) Geodesic exponential kernels: When curvature and linearity conflict. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3032–3042
Fletcher PT, Joshi S (2007) Riemannian geometry for the statistical analysis of diffusion tensor data. Signal Process 87(2):250–262
Goldberg Y, Zakai A, Kushnir D, Ritov Y (2008) Manifold learning: the price of normalization. J Mach Learn Res 9:1909–1939
Gu X, Wang Y, Chan TF, Thompson PM, Yau ST (2004) Genus zero surface conformal mapping and its application to brain surface mapping. IEEE Trans Med Imag 23(8):949–958
Jang C (2019) Riemannian distortion measures for non-euclidean data. Ph.D. thesis, Seoul National University
Jayasumana S, Hartley R, Salzmann M, Li H, Harandi M (2015) Kernel methods on riemannian manifolds with gaussian rbf kernels. IEEE Trans Pattern Anal Mach Intell 37(12):2464–2477
Lafon SS (2004) Diffusion maps and geometric harmonics. PhD thesis, Yale University Ph.D dissertation
Lee T, Park FC (2018) A geometric algorithm for robust multibody inertial parameter identification. IEEE Robot Autom Lett 3(3):2455–2462
Lin B, He X, Ye J (2015) A geometric viewpoint of manifold learning. Appl Inform 2:3. https://doi.org/10.1186/s40535-015-0006-6
McQueen J, Meila M, Perrault-Joncas D (2016) Nearly isometric embedding by relaxation. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R (eds) Advances in Neural Information Processing Systems, pp 2631–2639
Mullen P, Tong Y, Alliez P, Desbrun M (2008) Spectral conformal parameterization. Comput Graph Forum Wiley Online Libr 27:1487–1494
Park FC, Brockett RW (1994) Kinematic dexterity of robotic mechanisms. Int J Robot Res 13(1):1–15
Pelletier B (2005) Kernel density estimation on riemannian manifolds. Stat Probab Lett 73(3):297–304
Perrault-Joncas D, Meila M (2013) Non-linear dimensionality reduction: Riemannian metric estimation and the problem of geometric discovery. arXiv preprint arXiv:1305.7255
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Steinke F, Hein M, Schölkopf B (2010) Nonparametric regression between general riemannian manifolds. SIAM J Imag Sci 3(3):527–563
Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Vinué G, Simó A, Alemany S (2016) The \(k\)-means algorithm for 3d shapes with an application to apparel design. Adv Data Anal Classif 10(1):103–132
Wensing PM, Kim S, Slotine JJE (2018) Linear matrix inequalities for physically consistent inertial parameter identification: a statistical perspective on the mass distribution. IEEE Robot Autom Lett 3(1):60–67
Yang Y, Yu Y, Zhou Y, Du S, Davis J, Yang R (2014) Semantic parametric reshaping of human body models. In: 3D Vision (3DV), 2014 2nd International Conference on, IEEE, vol 2, pp 41–48
Zhang T, Li X, Tao D, Yang J (2008) Local coordinates alignment (lca): a novel manifold learning approach. Int J Pattern Recogn Artif Intell 22(04):667–690
Zhang Z, Zha H (2004) Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J Sci Comput 26(1):313–338
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Cheongjae Jang and Frank Chongwoo Park were supported in part by the NAVER LABS’ AMBIDEX Project, MSIT-IITP (2019-0-01367, BabyMind), SNU-IAMD, SNU BK21+ Program in Mechanical Engineering, SNU Institute for Engineering Research, the National Research Foundation of Korea (NRF-2016R1A5A1938472), the Technology Innovation Program (ATC+, 20008547) funded by the Ministry of Trade, Industry, and Energy (MOTIE, Korea), and SNU BMRR Grant DAPAUD190018ID. Yung-Kyun Noh was supported by Samsung Research Funding & Incubation Center of Samsung Electronics under Project Number SRFC-IT1901-13 and by Hanyang University (HY-2019). (Corresponding author: Frank Chongwoo Park.).
Appendices
Appendix
A: Further mathematical details of manifold learning algorithms
1.1 A.1 Proof of Proposition 1
Proof
The inverse metric \(JG^{-1}J^{\top }\) at \(x = x_i\) is obtained in (14) as
where \(Y = \begin{bmatrix} y_1, \ldots , y_N \end{bmatrix} \in {\mathbb {R}}^{n\times N}\) is the matrix representation of the embeddings, \(L = \frac{1}{ch}( {\tilde{D}}^{-1}{\tilde{K}} - I) \in {\mathbb {R}}^{N\times N}\) is the normalized graph Laplacian (\({\tilde{D}}, {\tilde{K}}\in {\mathbb {R}}^{N\times N}\) are obtained from Algorithm 1 and both \(c,h>0\)), \(L_i \in {\mathbb {R}}^N\) is the i-th row of L, and \(e_i = (0,\ldots ,1,\ldots ,0) \in {\mathbb {R}}^N\) is a standard basis vector whose i-th component is one.
To see if \(J G^{-1} J^{\top }(x_i)\) is positive semi-definite, it suffices to see if
is positive semi-definite. For any \(v = (v_1, \ldots , v_N) \in {\mathbb {R}}^N\),
where \(L_{ij}\) denotes the (i, j) entry of L in (22). In deriving (23)-(24), we use the equalities \(L_{ij} = \frac{1}{ch}(({\tilde{D}}_{ii})^{-1} {\tilde{K}}_{ij} - \delta _{ij})\) (\(\delta _{ij} = 1\) if \(i=j\) and 0 otherwise) and \(\sum _{j=1}^N ({\tilde{D}}_{ii})^{-1} {\tilde{K}}_{ij} = 1\), and also the inequality \({\tilde{K}}_{ij} \ge 0\) for \(i,j = 1,\ldots ,N\). Since the inequality \(v^\top M_i v \ge 0\) holds for any \(v \in {\mathbb {R}}^N\), \(M_i\) is positive semi-definite; then \(JG^{-1} J^\top (x_i) = \frac{1}{2ch} Y M_i Y^\top \) also becomes positive semi-definite for all \(i = 1, \ldots , N\). \(\square \)
1.2 A.2 Riemannian relaxation
In the Riemannian relaxation method of McQueen et al. (2016), \({{\mathcal {M}}}\) is chosen to be an m-dimensional submanifold of Euclidean ambient space \({\mathbb {R}}^D\), with Riemannian metric G corresponding to the Euclidean metric on \({\mathbb {R}}^D\) projected to \({{\mathcal {M}}}\). The target manifold \({{\mathcal {N}}}\) is set to be \({\mathbb {R}}^n\) for some a priori chosen dimension \(n \ge \text{ dim }({{\mathcal {M}}})\); the Riemannian metric on \({{\mathcal {N}}}\) is set to \(H=I\).
Given Euclidean data points \(u_i \in {\mathbb {R}}^D\), \(i = 1, \ldots , N\) (\(x_i \in {\mathbb {R}}^m\) in local coordinates), denote their n-dimensional embeddings by \(y_i \in {\mathbb {R}}^n\). The embedding is then obtained as the solution to the following optimization:
where \(JG^{-1}J^{\top }(u_i)\) denotes the \(JG^{-1}J^{\top }\) estimated on \(u_i\) using the method presented in Perrault-Joncas and Meila (2013), \(\Vert \cdot \Vert \) denotes the matrix spectral norm, and \(\alpha _i\) are weights. If \(n > \text {dim}(\mathcal {M})\), I in (25) is replaced by \(R_m R_m^{\top }\), where \(R_m = [r_1, \ldots , r_m] \in {\mathbb {R}}^{n\times m}\) with \(r_i \in {\mathbb {R}}^n\) the i-th singular vector of \(JG^{-1}J^{\top }\).
From the perspective of our Riemannian distortion framework, assuming the rank of \(J G^{-1} J^{\top }\) is m and the weights \(\alpha _i\) in (25) are set to \({\tilde{d}}_{i}\) (obtained from Algorithm 1), the objective function in (25) can be expressed as
where the \(\lambda _i\) are the m nonzero eigenvalues of \(J G^{-1} J^{\top }\), which are identical to those of \(J^{\top } J G^{-1}\). Since in practice the numerical estimation of \(J G^{-1} J^{\top } \in {\mathbb {R}}^{n\times n}\) may yield a rank higher than m when \(n > \text{ dim }({{\mathcal {M}}})\), one solution is to impose a soft constraint on the rank of \(J G^{-1} J^{\top }\), e.g., in McQueen et al. (2016) the optimization is formulated as
where \(\lambda _i\) are the eigenvalues of \(J G^{-1} J^{\top }\), \(\mathcal {I}_m\) denotes the set of indices of the m largest eigenvalues, and \(\epsilon >0\) is a scalar parameter intended to suppress the smaller \((n-m)\) eigenvalues.
1.3 A.3 Proof of Proposition 2
Proof
For \(H = I\), the discretized formulation of the harmonic mapping distortion in the form of (15) is obtained as follows:
where \({\tilde{K}}, {\tilde{d}}_i, {\tilde{D}}\) are obtained from Algorithm 1, \(L = \frac{1}{c \ h} ({\tilde{D}}^{-1}{\tilde{K}} - I)\in {\mathbb {R}}^{N \times N}\) is the graph Laplacian from Algorithm 1, \(L_i\) is the i-th row of L, \(e_i = (0,\ldots ,1,\ldots ,0) \in {\mathbb {R}}^N\) is a standard basis vector whose i-th component is one, and \(\mathbb {1}_N \in {\mathbb {R}}^N\) denotes an N-dimensional vector whose components are all one. In deriving (28)–(31), the estimate of \(JG^{-1}J^{\top }\) at \(x_i\) in (14), and the equalities \(\text {Tr}(J^{\top } HJG^{-1}) = \text {Tr}(JG^{-1}J^{\top }H)\) and \(\mathbb {1}_N^{\top } {\tilde{D}} L = 0\) are used.
Given a constant matrix \(Y_b\) specified by the boundary condition, minimizing (31) for \(Y_r\) reduces to
A closed-form solution for \(Y_r\) is obtained as
where \(W = {\tilde{K}}_{br} ({\tilde{D}}_{rr}-{\tilde{K}}_{rr})^{-1} \in {\mathbb {R}}^{N_b \times N_r}\).
Assume that \({\tilde{K}}_{ij} = {\tilde{K}}_{ji} \ge 0\) for all \(i, j = 1, \ldots , N\), a graph with \({\tilde{K}}_{rr}\) as its adjacency matrix is connected, and \({\tilde{K}}_{br}\) is not a zero matrix. Then the matrix \(({\tilde{D}}_{rr}-{\tilde{K}}_{rr})\) becomes positive-definite, so that W always exists. The positive-definiteness of \(({\tilde{D}}_{rr}-{\tilde{K}}_{rr})\) can be shown from the following inequality: for any \(v = (v_1, \ldots , v_{N_r})\ne 0 \in {\mathbb {R}}^{N_r}\),
where we use the fact that \(({\tilde{D}}_{rr})_{ii} = \sum _{k=1}^{N_b} ({\tilde{K}}_{br})_{ki} + \sum _{k = 1}^{N_r} ({\tilde{K}}_{rr})_{ki}\) in deriving (36). From the direct application of Cramer’s rule, it can be shown that each entry of \(({\tilde{D}}_{rr} - {\tilde{K}}_{rr})^{-1}\) is non-negative. Since every entry of \({\tilde{K}}_{br}\) is non-negative, all the entries of W are also non-negative. Furthermore, W satisfies the equation \(\mathbb {1}_{N_r}^{\top } = \mathbb {1}_{N_b}^{\top } W\) from the equality \({\tilde{D}}_{rr} \mathbb {1}_{N_r} = {\tilde{K}}_{rr} \mathbb {1}_{N_r} + {\tilde{K}}_{br}^{\top } \mathbb {1}_{N_b}\); hence the entries of each column of W sum to one. \(\square \)
B: Experimental details for Section 4
1.1 B.1 Swiss roll
Here we explain further experimental details for the case study performed in Sect. 4.1. The data points are non-uniformly sampled; referring to the unfolded manifold in Fig. 3a, the density is set to oscillate along the horizontal axis, while uniform along the vertical axis. When choosing the initial parameter value \(\theta _0\) for Algorithm 2, locality-preserving embeddings are preferable. As a choice for such an initial parameter value, we use two-dimensional embedding obtained from the Isomap (Tenenbaum et al. 2000). Any other embeddings that preserve locality can also be used as an initial guess, e.g., those from locally linear embedding (LLE; Roweis and Saul 2000), Laplacian eigenmap (LE; Belkin and Niyogi 2003), diffusion map (DM; Coifman and Lafon 2006), Hessian eigenmap (HLLE; Donoho and Grimes 2003), or local tangent space alignment (LTSA; Zhang and Zha 2004).
For the embedding obtained from the Isomap method, we test its five different scalings as the initial parameter value \(\theta _0\) for Algorithm 3; we then choose the output embeddings that show the best match to the pairwise distances between ten nearest neighbors in the ambient space. Also note that the kernel bandwidth parameter h for the approximation of the graph Laplacian in Algorithm 2 is chosen to have the same order to the averaged nearest neighbor distance from each of the data points according to Lafon (2004).
1.2 B.2 Synthetic P(2) data
1.2.1 B.2.1 Details for the submanifold considered in Section 4.2
The tangent space of \(\text {P(n)}\) at any \(P \in \text {P(n)}\) can be identified with \(\text {S(n)}\), the space of \(n \times n\) symmetric matrices. Given \(X, Y \in \text {S(n)}\), the affine-invariant Riemannian metric at P is defined by the inner product
Consider the following orthogonal decomposition of \(P \in \text {P(2)}\):
where \(R = \left[ \begin{array}{rr} \cos \theta &{} \quad -\sin \theta \\ \sin \theta &{} \quad \cos \theta \end{array} \right] \in \text {SO(2)}\) with \(\theta \in [0, \frac{\pi }{2})\), and \(S = \text {diag}(e^{p}, e^{q})\) for scalar p, q. A local coordinate chart can be defined in terms of \((p,q,\theta )\) on an open subset \(U = \{P \in \text {P(2)} \, | \, P \ne c I \ \ \text {for} \ \ c > 0\}\). The affine-invariant Riemannian metric in (37) is then represented in \((p,q,\theta )\)-coordinates (at \(p \ne q\)) as
For the case study in Sect. 4.2, the data set shown in Fig. 4a is generated by joining two cylinders (with a hole) \(\mathcal {C}_1\) and \(\mathcal {C}_2\) in \((p, q, \theta )\)-coordinates, where \(\mathcal {C}_1 = \{(p, q, \theta ) \ | \ p = \sin \theta _S, \; q = -1 + \cos \theta _S, \; \theta _S \in \left[ 0, \frac{4}{3}\pi \right] , \; \theta \in \left[ 0, \frac{\pi }{4} \right] \}\) and \(\mathcal {C}_2 = \{(p, q, \theta ) \ | \ p = \sin \theta _S, \; q = 1 - \cos \theta _S, \; \theta _S \in \left[ -\frac{4}{3}\pi , 0 \right] , \; \theta \in \left[ 0, \frac{\pi }{4} \right] \}\) (see Fig. 4a; the backbone curve in the figure corresponds to the direction along which \(\theta _S\) varies). The affine-invariant Riemannian metric on this submanifold (at \(\theta _S \ne 0\)) is obtained in terms of coordinates \((\theta , \theta _S)\), \(\theta _S \ne 0\), as
Because of the nonzero Riemannian curvature of this submanifold, isometric embeddings in two-dimensional Euclidean space do not exist.
1.2.2 B.2.2 Evaluation of the pairwise distance and tangent vector angle errors
For data points \(x_i \in \text {P(2)}\) and corresponding embeddings \(y_i \in {\mathbb {R}}^2\), \(i = 1, \ldots , N\), the pairwise distance error for k nearest points is defined as
where \(y_i\) denotes the optimal embedding of \(x_i\), \(\text {NN}_{k}(i)\) denotes the set of indices of k nearest neighbor points to \(x_i\), and \(\text {dist}(x_i,x_j)\) denotes the ground truth distance between \(x_i\) and \(x_j\), i.e., the geodesic distance measured on the submanifold. To measure angles, the tangent vectors are approximated by the difference between nearest neighbors. The tangent vector angle error is defined as
where \(v_i, V_i\) respectively denote the tangent vector from the l-th data point to the i-th data point in the optimal embeddings and the original data points, and \(\langle \cdot , \cdot \rangle \) denotes the inner product.
When reporting the final manifold learning results in Table 1, for the reference values to evaluate the pairwise distance error, we numerically obtain the minimal geodesic distances on the submanifolds. Also, the inner product in (37) is used to calculate the reference values for the angles between tangent vectors.
1.3 B.3 Human mass-inertia data
1.3.1 B.3.1 Synthesizing human mass-inertia data
Since mass-inertial parameter data for humans are not readily available, we use human shape data from Yang et al. (2014) to synthesize this data set; specifically, assuming uniform mass density, we integrate the volumes of the human body shapes to construct mass-inertial parameter data for the corresponding \(N_l=10\) links.
1.3.2 B.3.2 Further principal components of human mass-inertia data
As a supplement of Fig. 5 in Sect. 4.3, here we provide the third and fourth principal components of the human mass-inertia data obtained from both principal geodesic analysis (PGA) and vector space principal component analysis (PCA). The variations corresponding to the third and fourth principal components of PGA are shown in Fig. 8a–b. Principal component 3 captures variations in the height and torso thickness, and principal component 4 captures variations mainly in the height.
In the case of vector space PCA shown in Fig. 8c–d, the variations near the mean are qualitatively similar to those obtained for PGA. However, the positive-definiteness requirement is violated even for data points just 0.5 standard deviations away from the mean. The ellipsoids for those inertial parameters collapse and can be observed in the dashed red ellipses of Fig. 8c–d.
Rights and permissions
About this article
Cite this article
Jang, C., Noh, YK. & Park, F.C. A Riemannian geometric framework for manifold learning of non-Euclidean data. Adv Data Anal Classif 15, 673–699 (2021). https://doi.org/10.1007/s11634-020-00426-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-020-00426-3