Skip to main content

Advertisement

Log in

A Riemannian geometric framework for manifold learning of non-Euclidean data

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

A growing number of problems in data analysis and classification involve data that are non-Euclidean. For such problems, a naive application of vector space analysis algorithms will produce results that depend on the choice of local coordinates used to parametrize the data. At the same time, many data analysis and classification problems eventually reduce to an optimization, in which the criteria being minimized can be interpreted as the distortion associated with a mapping between two curved spaces. Exploiting this distortion minimizing perspective, we first show that manifold learning problems involving non-Euclidean data can be naturally framed as seeking a mapping between two Riemannian manifolds that is closest to being an isometry. A family of coordinate-invariant first-order distortion measures is then proposed that measure the proximity of the mapping to an isometry, and applied to manifold learning for non-Euclidean data sets. Case studies ranging from synthetic data to human mass-shape data demonstrate the many performance advantages of our Riemannian distortion minimization framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. A useful analogy is the problem of making two-dimensional Cartesian maps of the earth: given a set of data points sampled from the earth’s surface, a two-dimensional surface—in this case a sphere—is first fitted to these points, and a Cartesian map of the sphere that best preserves distances and angles is then sought.

  2. Recall the spectral norm of a square matrix A is the positive square root of the maximum eigenvalue of \(A^{\top } A\). It can also be verified that if \(\lambda _i\) is an eigenvalue of \(J^{\top } H J G^{-1}\), then \(\lambda _i-1\) is an eigenvalue of \(J^{\top } H J G^{-1}-I\).

  3. The kernel function defined on Riemannian manifolds as in (11) is known to not be positive-definite in general (Jayasumana et al. 2015; Feragen et al. 2015). However, for our manifold learning purposes that mainly target to capture only the submanifold on which the data points lie, we do not require the positive-definiteness of the kernel.

  4. Such a choice for weights is based on the approximation \({\tilde{d}}_i \approx c'\frac{\sqrt{\det G}}{\rho }(x_i)\) for a constant \(c'>0\), where \(\rho : \mathcal {M} \rightarrow {\mathbb {R}}\) is the underlying probability density generating data \(x_i\), satisfying \(\rho (x)\ge 0\) for all \(x \in {\mathbb {R}}^m\) and \(\int _\mathcal {M} \rho (x) \ dx = 1\). We refer the reader to equation (A.1.27) in Appendix A.1 of Jang (2019) for this approximation.

References

  • Barahona S, Gual-Arnau X, Ibáñez MV, Simó A (2018) Unsupervised classification of children’s bodies using currents. Adv Data Anal Classif 12(2):365–397

    Article  MathSciNet  Google Scholar 

  • Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396

    Article  Google Scholar 

  • Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434

    MathSciNet  MATH  Google Scholar 

  • Boothby WM (1986) An introduction to differentiable manifolds and Riemannian geometry, vol 120. Academic press, Cambridge

    MATH  Google Scholar 

  • Bronstein MM, Bruna J, LeCun Y, Szlam A, Vandergheynst P (2017) Geometric deep learning: going beyond euclidean data. IEEE Signal Process Magazine 34(4):18–42

    Article  Google Scholar 

  • Coifman RR, Lafon S (2006) Diffusion maps. Appl Comput Harmonic Anal 21(1):5–30

    Article  MathSciNet  Google Scholar 

  • Desbrun M, Meyer M, Alliez P (2002) Intrinsic parameterizations of surface meshes. Comput Graph Forum Wiley Online Libr 21:209–218

    Article  Google Scholar 

  • Donoho DL, Grimes C (2003) Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proc Natl Acad Sci 100(10):5591–5596

    Article  MathSciNet  Google Scholar 

  • Dubrovin BA, Fomenko AT, Novikov SP (1992) Modern geometry-methods and applications Part I. The geometry of surfaces, transformation groups, and fields. Springer, Berlin

    MATH  Google Scholar 

  • Eells J, Lemaire L (1978) A report on harmonic maps. Bull London Math Soc 10(1):1–68

    Article  MathSciNet  Google Scholar 

  • Eells J, Lemaire L (1988) Another report on harmonic maps. Bull London Math Soc 20(5):385–524

    Article  MathSciNet  Google Scholar 

  • Eells J, Sampson JH (1964) Harmonic mappings of Riemannian manifolds. Am J Math 86(1):109–160

    Article  MathSciNet  Google Scholar 

  • Feragen A, Lauze F, Hauberg S (2015) Geodesic exponential kernels: When curvature and linearity conflict. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3032–3042

  • Fletcher PT, Joshi S (2007) Riemannian geometry for the statistical analysis of diffusion tensor data. Signal Process 87(2):250–262

    Article  Google Scholar 

  • Goldberg Y, Zakai A, Kushnir D, Ritov Y (2008) Manifold learning: the price of normalization. J Mach Learn Res 9:1909–1939

    MathSciNet  MATH  Google Scholar 

  • Gu X, Wang Y, Chan TF, Thompson PM, Yau ST (2004) Genus zero surface conformal mapping and its application to brain surface mapping. IEEE Trans Med Imag 23(8):949–958

    Article  Google Scholar 

  • Jang C (2019) Riemannian distortion measures for non-euclidean data. Ph.D. thesis, Seoul National University

  • Jayasumana S, Hartley R, Salzmann M, Li H, Harandi M (2015) Kernel methods on riemannian manifolds with gaussian rbf kernels. IEEE Trans Pattern Anal Mach Intell 37(12):2464–2477

    Article  Google Scholar 

  • Lafon SS (2004) Diffusion maps and geometric harmonics. PhD thesis, Yale University Ph.D dissertation

  • Lee T, Park FC (2018) A geometric algorithm for robust multibody inertial parameter identification. IEEE Robot Autom Lett 3(3):2455–2462

    Article  Google Scholar 

  • Lin B, He X, Ye J (2015) A geometric viewpoint of manifold learning. Appl Inform 2:3. https://doi.org/10.1186/s40535-015-0006-6

    Article  Google Scholar 

  • McQueen J, Meila M, Perrault-Joncas D (2016) Nearly isometric embedding by relaxation. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R (eds) Advances in Neural Information Processing Systems, pp 2631–2639

  • Mullen P, Tong Y, Alliez P, Desbrun M (2008) Spectral conformal parameterization. Comput Graph Forum Wiley Online Libr 27:1487–1494

    Article  Google Scholar 

  • Park FC, Brockett RW (1994) Kinematic dexterity of robotic mechanisms. Int J Robot Res 13(1):1–15

    Article  Google Scholar 

  • Pelletier B (2005) Kernel density estimation on riemannian manifolds. Stat Probab Lett 73(3):297–304

    Article  MathSciNet  Google Scholar 

  • Perrault-Joncas D, Meila M (2013) Non-linear dimensionality reduction: Riemannian metric estimation and the problem of geometric discovery. arXiv preprint arXiv:1305.7255

  • Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  Google Scholar 

  • Steinke F, Hein M, Schölkopf B (2010) Nonparametric regression between general riemannian manifolds. SIAM J Imag Sci 3(3):527–563

    Article  MathSciNet  Google Scholar 

  • Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

    Article  Google Scholar 

  • Vinué G, Simó A, Alemany S (2016) The \(k\)-means algorithm for 3d shapes with an application to apparel design. Adv Data Anal Classif 10(1):103–132

    Article  MathSciNet  Google Scholar 

  • Wensing PM, Kim S, Slotine JJE (2018) Linear matrix inequalities for physically consistent inertial parameter identification: a statistical perspective on the mass distribution. IEEE Robot Autom Lett 3(1):60–67

    Article  Google Scholar 

  • Yang Y, Yu Y, Zhou Y, Du S, Davis J, Yang R (2014) Semantic parametric reshaping of human body models. In: 3D Vision (3DV), 2014 2nd International Conference on, IEEE, vol 2, pp 41–48

  • Zhang T, Li X, Tao D, Yang J (2008) Local coordinates alignment (lca): a novel manifold learning approach. Int J Pattern Recogn Artif Intell 22(04):667–690

    Article  Google Scholar 

  • Zhang Z, Zha H (2004) Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J Sci Comput 26(1):313–338

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frank Chongwoo Park.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Cheongjae Jang and Frank Chongwoo Park were supported in part by the NAVER LABS’ AMBIDEX Project, MSIT-IITP (2019-0-01367, BabyMind), SNU-IAMD, SNU BK21+ Program in Mechanical Engineering, SNU Institute for Engineering Research, the National Research Foundation of Korea (NRF-2016R1A5A1938472), the Technology Innovation Program (ATC+, 20008547) funded by the Ministry of Trade, Industry, and Energy (MOTIE, Korea), and SNU BMRR Grant DAPAUD190018ID. Yung-Kyun Noh was supported by Samsung Research Funding & Incubation Center of Samsung Electronics under Project Number SRFC-IT1901-13 and by Hanyang University (HY-2019). (Corresponding author: Frank Chongwoo Park.).

Appendices

Appendix

A: Further mathematical details of manifold learning algorithms

1.1 A.1 Proof of Proposition 1

Proof

The inverse metric \(JG^{-1}J^{\top }\) at \(x = x_i\) is obtained in (14) as

$$\begin{aligned} J G^{-1} J^{\top }(x_i) = \frac{1}{2} Y (\text {diag}(L_i) - e_i e_i^{\top } L - L^{\top } e_i e_i^{\top }) Y^{\top }, \end{aligned}$$

where \(Y = \begin{bmatrix} y_1, \ldots , y_N \end{bmatrix} \in {\mathbb {R}}^{n\times N}\) is the matrix representation of the embeddings, \(L = \frac{1}{ch}( {\tilde{D}}^{-1}{\tilde{K}} - I) \in {\mathbb {R}}^{N\times N}\) is the normalized graph Laplacian (\({\tilde{D}}, {\tilde{K}}\in {\mathbb {R}}^{N\times N}\) are obtained from Algorithm 1 and both \(c,h>0\)), \(L_i \in {\mathbb {R}}^N\) is the i-th row of L, and \(e_i = (0,\ldots ,1,\ldots ,0) \in {\mathbb {R}}^N\) is a standard basis vector whose i-th component is one.

To see if \(J G^{-1} J^{\top }(x_i)\) is positive semi-definite, it suffices to see if

$$\begin{aligned} M_i \equiv c \ h \left( \text {diag}(L_i) - e_i e_i^{\top } L - L^{\top } e_i e_i^{\top } \right) \in {\mathbb {R}}^{N\times N} \end{aligned}$$
(20)

is positive semi-definite. For any \(v = (v_1, \ldots , v_N) \in {\mathbb {R}}^N\),

$$\begin{aligned} v^\top M_i v&= c \ h \left( \sum _{j,k=1}^N \left( \text {diag}(L_i) - e_i e_i^{\top } L - L^{\top } e_i e_i^{\top } \right) _{jk} v_j v_k \right) \end{aligned}$$
(21)
$$\begin{aligned}&= c \ h \left( \sum _{j=1}^N L_{ij} v_j^2 - 2 L_{ij} v_i v_j \right) \end{aligned}$$
(22)
$$\begin{aligned}&= v_i^2 + \sum _{j=1}^N ({\tilde{D}}_{ii})^{-1} {\tilde{K}}_{ij} (v_j^2 - 2v_i v_j) \end{aligned}$$
(23)
$$\begin{aligned}&= \sum _{j=1}^N ({\tilde{D}}_{ii})^{-1} {\tilde{K}}_{ij} (v_i - v_j)^2 \ge 0, \end{aligned}$$
(24)

where \(L_{ij}\) denotes the (ij) entry of L in (22). In deriving (23)-(24), we use the equalities \(L_{ij} = \frac{1}{ch}(({\tilde{D}}_{ii})^{-1} {\tilde{K}}_{ij} - \delta _{ij})\) (\(\delta _{ij} = 1\) if \(i=j\) and 0 otherwise) and \(\sum _{j=1}^N ({\tilde{D}}_{ii})^{-1} {\tilde{K}}_{ij} = 1\), and also the inequality \({\tilde{K}}_{ij} \ge 0\) for \(i,j = 1,\ldots ,N\). Since the inequality \(v^\top M_i v \ge 0\) holds for any \(v \in {\mathbb {R}}^N\), \(M_i\) is positive semi-definite; then \(JG^{-1} J^\top (x_i) = \frac{1}{2ch} Y M_i Y^\top \) also becomes positive semi-definite for all \(i = 1, \ldots , N\). \(\square \)

1.2 A.2 Riemannian relaxation

In the Riemannian relaxation method of McQueen et al. (2016), \({{\mathcal {M}}}\) is chosen to be an m-dimensional submanifold of Euclidean ambient space \({\mathbb {R}}^D\), with Riemannian metric G corresponding to the Euclidean metric on \({\mathbb {R}}^D\) projected to \({{\mathcal {M}}}\). The target manifold \({{\mathcal {N}}}\) is set to be \({\mathbb {R}}^n\) for some a priori chosen dimension \(n \ge \text{ dim }({{\mathcal {M}}})\); the Riemannian metric on \({{\mathcal {N}}}\) is set to \(H=I\).

Given Euclidean data points \(u_i \in {\mathbb {R}}^D\), \(i = 1, \ldots , N\) (\(x_i \in {\mathbb {R}}^m\) in local coordinates), denote their n-dimensional embeddings by \(y_i \in {\mathbb {R}}^n\). The embedding is then obtained as the solution to the following optimization:

$$\begin{aligned} \min _{y_i} \sum _{i=1}^N \Vert JG^{-1}J^{\top }(u_i) - I\Vert ^2 \alpha _i, \end{aligned}$$
(25)

where \(JG^{-1}J^{\top }(u_i)\) denotes the \(JG^{-1}J^{\top }\) estimated on \(u_i\) using the method presented in Perrault-Joncas and Meila (2013), \(\Vert \cdot \Vert \) denotes the matrix spectral norm, and \(\alpha _i\) are weights. If \(n > \text {dim}(\mathcal {M})\), I in (25) is replaced by \(R_m R_m^{\top }\), where \(R_m = [r_1, \ldots , r_m] \in {\mathbb {R}}^{n\times m}\) with \(r_i \in {\mathbb {R}}^n\) the i-th singular vector of \(JG^{-1}J^{\top }\).

From the perspective of our Riemannian distortion framework, assuming the rank of \(J G^{-1} J^{\top }\) is m and the weights \(\alpha _i\) in (25) are set to \({\tilde{d}}_{i}\) (obtained from Algorithm 1), the objective function in (25) can be expressed as

$$\begin{aligned} \min _{f} \mathcal {D}(f) = \int _{{\mathcal {M}}} \max _{i} (\lambda _i-1)^2 \sqrt{\det G} \, dx^1 \cdots dx^m, \end{aligned}$$
(26)

where the \(\lambda _i\) are the m nonzero eigenvalues of \(J G^{-1} J^{\top }\), which are identical to those of \(J^{\top } J G^{-1}\). Since in practice the numerical estimation of \(J G^{-1} J^{\top } \in {\mathbb {R}}^{n\times n}\) may yield a rank higher than m when \(n > \text{ dim }({{\mathcal {M}}})\), one solution is to impose a soft constraint on the rank of \(J G^{-1} J^{\top }\), e.g., in McQueen et al. (2016) the optimization is formulated as

$$\begin{aligned} \min _{f} \mathcal {D}(f) = \int _{{\mathcal {M}}} \max \left( \max _{i \in \mathcal {I}_m} (\lambda _i-1)^2, \ \max _{i \not \in \mathcal {I}_m} \left( \frac{\lambda _i}{\epsilon }\right) ^2\right) \sqrt{\det G} \, dx^1 \cdots dx^m, \end{aligned}$$
(27)

where \(\lambda _i\) are the eigenvalues of \(J G^{-1} J^{\top }\), \(\mathcal {I}_m\) denotes the set of indices of the m largest eigenvalues, and \(\epsilon >0\) is a scalar parameter intended to suppress the smaller \((n-m)\) eigenvalues.

1.3 A.3 Proof of Proposition 2

Proof

For \(H = I\), the discretized formulation of the harmonic mapping distortion in the form of (15) is obtained as follows:

$$\begin{aligned} {{\mathcal {D}}}(Y)= & {} \sum _{i=1}^N \text {Tr}(JG^{-1}J^{\top }(x_i)) \ {\tilde{d}}_i \end{aligned}$$
(28)
$$\begin{aligned}= & {} \frac{1}{2} \sum _{i=1}^N \text {Tr} \left( Y(\text {diag} (L_i) - e_i e_i^{\top } L - L^{\top } e_i e_i^{\top })Y^{\top } \right) {\tilde{d}}_i \end{aligned}$$
(29)
$$\begin{aligned}= & {} \frac{1}{2}\text {Tr} \left( Y(\text {diag} ( \mathbb {1}_N^{\top } {\tilde{D}} L) - {\tilde{D}} L - L^{\top } {\tilde{D}} ) Y^{\top } \right) \end{aligned}$$
(30)
$$\begin{aligned}= & {} \frac{1}{c \ h} \text {Tr}(Y({\tilde{D}}-{\tilde{K}})Y^{\top }), \end{aligned}$$
(31)

where \({\tilde{K}}, {\tilde{d}}_i, {\tilde{D}}\) are obtained from Algorithm 1, \(L = \frac{1}{c \ h} ({\tilde{D}}^{-1}{\tilde{K}} - I)\in {\mathbb {R}}^{N \times N}\) is the graph Laplacian from Algorithm 1, \(L_i\) is the i-th row of L, \(e_i = (0,\ldots ,1,\ldots ,0) \in {\mathbb {R}}^N\) is a standard basis vector whose i-th component is one, and \(\mathbb {1}_N \in {\mathbb {R}}^N\) denotes an N-dimensional vector whose components are all one. In deriving (28)–(31), the estimate of \(JG^{-1}J^{\top }\) at \(x_i\) in (14), and the equalities \(\text {Tr}(J^{\top } HJG^{-1}) = \text {Tr}(JG^{-1}J^{\top }H)\) and \(\mathbb {1}_N^{\top } {\tilde{D}} L = 0\) are used.

Given a constant matrix \(Y_b\) specified by the boundary condition, minimizing (31) for \(Y_r\) reduces to

$$\begin{aligned} \min _{Y_r} \text{ Tr }(Y({\tilde{D}}-{\tilde{K}})Y^{\top }) = \text{ Tr }(Y_{b}({\tilde{D}}_{bb}-{\tilde{K}}_{bb})Y_{b}^{\top } - 2Y_{b}{\tilde{K}}_{br}Y_r^{\top } + Y_r({\tilde{D}}_{rr}-{\tilde{K}}_{rr})Y_r^{\top }). \end{aligned}$$
(32)

A closed-form solution for \(Y_r\) is obtained as

$$\begin{aligned} Y_r = Y_b {\tilde{K}}_{br} ({\tilde{D}}_{rr}-{\tilde{K}}_{rr})^{-1} = Y_b W, \end{aligned}$$
(33)

where \(W = {\tilde{K}}_{br} ({\tilde{D}}_{rr}-{\tilde{K}}_{rr})^{-1} \in {\mathbb {R}}^{N_b \times N_r}\).

Assume that \({\tilde{K}}_{ij} = {\tilde{K}}_{ji} \ge 0\) for all \(i, j = 1, \ldots , N\), a graph with \({\tilde{K}}_{rr}\) as its adjacency matrix is connected, and \({\tilde{K}}_{br}\) is not a zero matrix. Then the matrix \(({\tilde{D}}_{rr}-{\tilde{K}}_{rr})\) becomes positive-definite, so that W always exists. The positive-definiteness of \(({\tilde{D}}_{rr}-{\tilde{K}}_{rr})\) can be shown from the following inequality: for any \(v = (v_1, \ldots , v_{N_r})\ne 0 \in {\mathbb {R}}^{N_r}\),

$$\begin{aligned} v^\top ({\tilde{D}}_{rr}-{\tilde{K}}_{rr}) v&= \sum _{i,j = 1}^{N_r} ({\tilde{D}}_{rr}-{\tilde{K}}_{rr})_{ij} v_i v_j \end{aligned}$$
(34)
$$\begin{aligned}&= \sum _i = 1=^{N_r} ({\tilde{D}}_{rr})_{ii} v_i^2 - \sum _{i,j = 1}^{N_r} ({\tilde{K}}_{rr})_{ij} v_i v_j \end{aligned}$$
(35)
$$\begin{aligned}&= \sum _{i = 1}^{N_r} \left( \sum _{k = 1}^{N_b} ({\tilde{K}}_{br})_{ki} \right) v_i^2 + \frac{1}{2}\sum _{i,j = 1}^{N_r} ({\tilde{K}}_{rr})_{ij} (v_i - v_j)^2 > 0, \end{aligned}$$
(36)

where we use the fact that \(({\tilde{D}}_{rr})_{ii} = \sum _{k=1}^{N_b} ({\tilde{K}}_{br})_{ki} + \sum _{k = 1}^{N_r} ({\tilde{K}}_{rr})_{ki}\) in deriving (36). From the direct application of Cramer’s rule, it can be shown that each entry of \(({\tilde{D}}_{rr} - {\tilde{K}}_{rr})^{-1}\) is non-negative. Since every entry of \({\tilde{K}}_{br}\) is non-negative, all the entries of W are also non-negative. Furthermore, W satisfies the equation \(\mathbb {1}_{N_r}^{\top } = \mathbb {1}_{N_b}^{\top } W\) from the equality \({\tilde{D}}_{rr} \mathbb {1}_{N_r} = {\tilde{K}}_{rr} \mathbb {1}_{N_r} + {\tilde{K}}_{br}^{\top } \mathbb {1}_{N_b}\); hence the entries of each column of W sum to one. \(\square \)

B: Experimental details for Section 4

1.1 B.1 Swiss roll

Here we explain further experimental details for the case study performed in Sect. 4.1. The data points are non-uniformly sampled; referring to the unfolded manifold in Fig. 3a, the density is set to oscillate along the horizontal axis, while uniform along the vertical axis. When choosing the initial parameter value \(\theta _0\) for Algorithm 2, locality-preserving embeddings are preferable. As a choice for such an initial parameter value, we use two-dimensional embedding obtained from the Isomap (Tenenbaum et al. 2000). Any other embeddings that preserve locality can also be used as an initial guess, e.g., those from locally linear embedding (LLE; Roweis and Saul 2000), Laplacian eigenmap (LE; Belkin and Niyogi 2003), diffusion map (DM; Coifman and Lafon 2006), Hessian eigenmap (HLLE; Donoho and Grimes 2003), or local tangent space alignment (LTSA; Zhang and Zha 2004).

For the embedding obtained from the Isomap method, we test its five different scalings as the initial parameter value \(\theta _0\) for Algorithm 3; we then choose the output embeddings that show the best match to the pairwise distances between ten nearest neighbors in the ambient space. Also note that the kernel bandwidth parameter h for the approximation of the graph Laplacian in Algorithm 2 is chosen to have the same order to the averaged nearest neighbor distance from each of the data points according to Lafon (2004).

1.2 B.2 Synthetic P(2) data

1.2.1 B.2.1 Details for the submanifold considered in Section 4.2

The tangent space of \(\text {P(n)}\) at any \(P \in \text {P(n)}\) can be identified with \(\text {S(n)}\), the space of \(n \times n\) symmetric matrices. Given \(X, Y \in \text {S(n)}\), the affine-invariant Riemannian metric at P is defined by the inner product

$$\begin{aligned} \langle X, Y \rangle _{P} = \text {Tr}(P^{-1} X P^{-1} Y). \end{aligned}$$
(37)

Consider the following orthogonal decomposition of \(P \in \text {P(2)}\):

$$\begin{aligned} P = RSR^{\top }, \end{aligned}$$
(38)

where \(R = \left[ \begin{array}{rr} \cos \theta &{} \quad -\sin \theta \\ \sin \theta &{} \quad \cos \theta \end{array} \right] \in \text {SO(2)}\) with \(\theta \in [0, \frac{\pi }{2})\), and \(S = \text {diag}(e^{p}, e^{q})\) for scalar pq. A local coordinate chart can be defined in terms of \((p,q,\theta )\) on an open subset \(U = \{P \in \text {P(2)} \, | \, P \ne c I \ \ \text {for} \ \ c > 0\}\). The affine-invariant Riemannian metric in (37) is then represented in \((p,q,\theta )\)-coordinates (at \(p \ne q\)) as

$$\begin{aligned} ds^2 = \text {Tr}((P^{-1}dP)^2) = {dp}^2 + {dq}^2 + 2\left( e^{p-q} + e^{q-p} - 2\right) d\theta ^2. \end{aligned}$$
(39)

For the case study in Sect. 4.2, the data set shown in Fig. 4a is generated by joining two cylinders (with a hole) \(\mathcal {C}_1\) and \(\mathcal {C}_2\) in \((p, q, \theta )\)-coordinates, where \(\mathcal {C}_1 = \{(p, q, \theta ) \ | \ p = \sin \theta _S, \; q = -1 + \cos \theta _S, \; \theta _S \in \left[ 0, \frac{4}{3}\pi \right] , \; \theta \in \left[ 0, \frac{\pi }{4} \right] \}\) and \(\mathcal {C}_2 = \{(p, q, \theta ) \ | \ p = \sin \theta _S, \; q = 1 - \cos \theta _S, \; \theta _S \in \left[ -\frac{4}{3}\pi , 0 \right] , \; \theta \in \left[ 0, \frac{\pi }{4} \right] \}\) (see Fig. 4a; the backbone curve in the figure corresponds to the direction along which \(\theta _S\) varies). The affine-invariant Riemannian metric on this submanifold (at \(\theta _S \ne 0\)) is obtained in terms of coordinates \((\theta , \theta _S)\), \(\theta _S \ne 0\), as

$$\begin{aligned} ds^2 = d\theta _S^2 + 2\left( e^{1 + \sqrt{2}\sin (|\theta _S| - \frac{\pi }{4})} + e^{-1 - \sqrt{2}\sin (|\theta _S| - \frac{\pi }{4})} - 2\right) d\theta ^2. \end{aligned}$$
(40)

Because of the nonzero Riemannian curvature of this submanifold, isometric embeddings in two-dimensional Euclidean space do not exist.

1.2.2 B.2.2 Evaluation of the pairwise distance and tangent vector angle errors

For data points \(x_i \in \text {P(2)}\) and corresponding embeddings \(y_i \in {\mathbb {R}}^2\), \(i = 1, \ldots , N\), the pairwise distance error for k nearest points is defined as

$$\begin{aligned} \text {Pairwise distance error} = \frac{1}{k N}\sum _{i=1}^N\sum _{j\in \{\text {NN}_k(i)\}}\left( \Vert y_i-y_j\Vert - \text {dist}(x_i, x_j)\right) ^2, \end{aligned}$$
(41)

where \(y_i\) denotes the optimal embedding of \(x_i\), \(\text {NN}_{k}(i)\) denotes the set of indices of k nearest neighbor points to \(x_i\), and \(\text {dist}(x_i,x_j)\) denotes the ground truth distance between \(x_i\) and \(x_j\), i.e., the geodesic distance measured on the submanifold. To measure angles, the tangent vectors are approximated by the difference between nearest neighbors. The tangent vector angle error is defined as

$$\begin{aligned}&\text {Tangent vector angle error} \nonumber \\&\quad = \frac{2}{k(k-1)N}\sum _{l=1}^N\sum _{i,j \in \{\text {NN}_{k}(l)\}} \left( \text {acos}(\langle v_i, v_j\rangle ) - \text {acos}(\langle V_i, V_j\rangle )\right) ^2, \end{aligned}$$
(42)

where \(v_i, V_i\) respectively denote the tangent vector from the l-th data point to the i-th data point in the optimal embeddings and the original data points, and \(\langle \cdot , \cdot \rangle \) denotes the inner product.

When reporting the final manifold learning results in Table 1, for the reference values to evaluate the pairwise distance error, we numerically obtain the minimal geodesic distances on the submanifolds. Also, the inner product in (37) is used to calculate the reference values for the angles between tangent vectors.

1.3 B.3 Human mass-inertia data

1.3.1 B.3.1 Synthesizing human mass-inertia data

Since mass-inertial parameter data for humans are not readily available, we use human shape data from Yang et al. (2014) to synthesize this data set; specifically, assuming uniform mass density, we integrate the volumes of the human body shapes to construct mass-inertial parameter data for the corresponding \(N_l=10\) links.

Fig. 8
figure 8

Principal components (PCs) of the human mass-inertia data obtained from PGA and vector space PCA. ab respectively depict the third and fourth PCs obtained from PGA for the range of \(\pm 4\) standard deviations from the mean; cd respectively depict the third and fourth PCs obtained from PCA for different range of standard deviations from the mean

1.3.2 B.3.2 Further principal components of human mass-inertia data

As a supplement of Fig. 5 in Sect. 4.3, here we provide the third and fourth principal components of the human mass-inertia data obtained from both principal geodesic analysis (PGA) and vector space principal component analysis (PCA). The variations corresponding to the third and fourth principal components of PGA are shown in Fig. 8a–b. Principal component 3 captures variations in the height and torso thickness, and principal component 4 captures variations mainly in the height.

In the case of vector space PCA shown in Fig. 8c–d, the variations near the mean are qualitatively similar to those obtained for PGA. However, the positive-definiteness requirement is violated even for data points just 0.5 standard deviations away from the mean. The ellipsoids for those inertial parameters collapse and can be observed in the dashed red ellipses of Fig. 8c–d.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jang, C., Noh, YK. & Park, F.C. A Riemannian geometric framework for manifold learning of non-Euclidean data. Adv Data Anal Classif 15, 673–699 (2021). https://doi.org/10.1007/s11634-020-00426-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-020-00426-3

Keywords

Mathematics Subject Classification

Navigation