A rotation based regularization method for semi-supervised learning

Shukla, Prashant; Abhishek; Verma, Shekhar; Kumar, Manish

doi:10.1007/s10044-020-00947-9

A rotation based regularization method for semi-supervised learning

Theoretical Advances
Published: 04 January 2021

Volume 24, pages 887–905, (2021)
Cite this article

Download PDF

Pattern Analysis and Applications Aims and scope Submit manuscript

A rotation based regularization method for semi-supervised learning

Download PDF

Prashant Shukla ORCID: orcid.org/0000-0002-1810-6650¹,
Abhishek¹,
Shekhar Verma¹ &
…
Manish Kumar¹

2345 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

In manifold learning, the intrinsic geometry of the manifold is explored and preserved by identifying the optimal local neighborhood around each observation. It is well known that when a Riemannian manifold is unfolded correctly, the observations lying spatially near to the manifold, should remain near on the lower dimension as well. Due to the nonlinear properties of manifold around each observation, finding such optimal neighborhood on the manifold is a challenge. Thus, a sub-optimal neighborhood may lead to erroneous representation and incorrect inferences. In this paper, we propose a rotation-based affinity metric for accurate graph Laplacian approximation. It exploits the property of aligned tangent spaces of observations in an optimal neighborhood to approximate correct affinity between them. Extensive experiments on both synthetic and real world datasets have been performed. It is observed that proposed method outperforms existing nonlinear dimensionality reduction techniques in low-dimensional representation for synthetic datasets. The results on real world datasets like COVID-19 prove that our approach increases the accuracy of classification by enhancing Laplacian regularization.

Hyperbolic Deep Learning in Computer Vision: A Survey

Article Open access 26 March 2024

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Article 13 April 2024

SSCNet: learning-based subspace clustering

Article Open access 08 April 2024

1 Introduction

A semi-supervised method utilizes unlabeled data for training along with given labeled data to exploit the hidden intrinsic geometrical information. It implicitly assumes that the underlying data holds either of the three assumptions: smoothness, clustering or manifold [27]. Manifold learning methods exploit the manifold assumption. These methods attempt to preserve geometric properties such as distances, proximity, angles, or local patches [22].

The real-world data gathered from imaging devices, medical science, and business applications usually lie in high dimension, and this causes the curse of dimensionality. One of the main objectives, in the analysis of such high-dimensional datasets, is to learn their geometrical and topological structure. Generally, the data is parameterized as points in ${\mathbb {R}}^{D}$, the correlation between parameters often suggests the manifold assumption that the data points are distributed on a very low-dimensional space ${\mathbb {R}}^{m}$ embedded on a Riemannian manifold ${\mathbb {R}}^{D}$ and $m\ll D$ [2, 5, 6, 30, 33]. Manifold learning algorithms transform the high-dimensional data into a low-dimensional embedding space using existing dimensionality reduction methods. Principal component analysis (PCA) [9, 35, 40, 47], multidimensional scaling (MDS) [13,14,15, 19], linear discriminant analysis (LDA) [3, 7, 20], etc., are some popular linear dimensionality reduction algorithms. They provide true representation in the case of linear manifold, but fail to discover nonlinear or curved structures of the input data. In the case of handwritten characters, spoken letters and medical images, etc., manifolds do not follow linear properties and have nonlinear structure. The intrinsic geometry of the nonlinear manifold is explored by identifying the optimal local neighborhood around each observation. We assume that the data samples $x_{i}\in X$ are drawn from a smooth Riemannian manifold ${\mathcal {M}}\in {\mathbb {R}}^{D}$. If a smooth Riemannian manifold is unfolded correctly, the observations lying spatially near on the manifold, should remain near on the lower dimension as well with their tangent spaces aligned.

Generally, due to the varying curvature of manifold around each observation, finding such optimal neighborhood on the manifold, is a challenge. Thus, the affinity calculated between these observations on manifold are erroneous as it may be affected from noise. On such a manifold, the tangent planes on observations are not aligned.

Manifold learning approaches are suitable for exploiting the nonlinear structures into a flat low-dimensional embedding space [22]. The aim of these approaches is to identify and exploit local linear space. The existing state-of-the-art algorithms like isometric feature mapping (ISOMAP) [33], local linear embedding (LLE) [29], Laplacian eigenmap (LE) [4], local tangent space alignment (LTSA) [36, 45], Hilbert–Schmidt independence criterion-regularized LTSA (HSIC–LTSA) [46], graph-regularized linear discriminant analysis (GRLDA) [16], jerk-based manifold regularization [39], robust Laplacian [1] identify and exploit such local structures. These methods have been applied on wide variety of applications; for instance, face recognition, facial expression transferring, handwriting identification, 3D body pose recovery, medical imaging and many more. An approach for face recognition is a two-dimensional neighborhood preserving projection (2DNPP) [42].

These state-of-the-art manifold learning methods can be categorized into distance-preserving methods, angle-preserving methods and proximity-preserving methods, which align local neighborhood for each data point into a global coordinate space. A method focuses on one perspective in order to preserve a single geometric property. For instance, Isomap is a distance preserving method; LE, LLE and LTSA are proximity-preserving methods, which assume that unfolding manifold results into aligned tangent planes of all the neighboring observations in the manifold [44]. LTSA assumes that the given data is uniformly distributed and data in local neighborhood of the manifold, follows linear properties, i.e., they lie in or close to a linear sub-space.

In LE, diffusion map (DM) [23] and vector diffusion map (VDM) [31], the data is represented as a weighted undirected graph. The vertices of graph correspond to the data observations, and the weights on edges quantify the affinity between them. In a local linear manifold, Euclidean distance is used as affinity metric, which can be described as a kernel function of the distance. If the data $\{x_{i}\}_{i=1}^{n}$ consists of n observations in $L_{2}({\mathbb {R}}^{3})$, then the distance between points $x_i$ and $x_j$ are calculated using Eq. (1)

$$\begin{aligned} d_E(x_i,x_j) = || x_i - x_j ||_{L_{2}({\mathbb {R}}^{3})} ,\end{aligned}$$

(1)

and affinity of edges is calculated by Eq. (2)

$$\begin{aligned} w_{ij} = e^{-d_E^2(x_i,x_j)/2}. \end{aligned}$$

(2)

To call an embedding faithful, we check whether it preserves the local structure of neighborhoods on the manifold, i.e., handles distances, angles, and neighborhoods in a comprehensive way.

LTSA assumes the local linearity of the manifold. Thus, local linear approximations of a manifold are constructed as a collection of overlapping approximate tangent spaces at every observation. These are, then, globally aligned to construct the global coordinate system for the underlying nonlinear manifold [45]. Here, the local tangent space is used to provide a low-dimensional linear approximation of the local geometric structure of the nonlinear manifold. The proposed rotation-based regularization method is based on the observation that the Riemannian assumption of local linearity of the manifold may not hold in a kNN neighborhood. Hence, a dimensionality reduction method, which relies on this assumption may yield suboptimal performance. This entails that the local neighborhood must be flattened so that the Euclidean distance is an accurate measure of affinity. Diffusion map assumes that the Euclidean distance between observations is approximated by the diffusion distance in the original feature space between probability distributions centered at those observations [23]. It assumes a nonlinear geometry and measures the similarity between two points at a specific scale through a diffusion metric.

In this paper, we determine the accurate pairwise affinity by aligning the tangent spaces of all the local points with respect to the point of interest to exploit the property of aligned tangent spaces of observations in an optimal neighborhood. Rotation is used to align the neighbors which are deviated from the tangent plane of the point of interest, so that they are on the same Euclidean plane. If the points are already on the plane, they are unaffected by the rotation. This gives an enhanced affinity for Laplacian, which is useful when data is affected from noise and the manifold curvatures are variable.

The contributions of this work are as follows:

1.
In proposed approach, the Riemannian manifold assumption of local linearity of the kNN graph neighborhoods around data points is ensured by flattening the manifold by rotating the tangent spaces of the neighbors with the tangent space of the data point of interest. The pairwise Euclidean distance between data points, then, becomes an accurate measure of the geodesic distance between vertices.
2.
The updated affinities based on the pairwise Euclidean distances are used in the graph Laplacian-based manifold regularization. This yields higher classification accuracy as the modified graph Laplacian, the rotation-based Laplacian, is able to give a better estimate of underlying marginal distribution.

The remainder of this paper is organized as follows: Sect. 2 defines the problem to be solved in this work. In Sect. 3, we propose our rotation-based Laplacian regularization approach for manifold learning and regularization. Section 4 contains the results obtained using our method and its comparison with state-of-the-art methods. Finally, Sect. 5 concludes our work by highlighting the salient features of rotation-based regularization.

2 Problem definition

On a Riemannian manifold, the local linear neighborhood assumption allows Euclidean distance as a measure of affinity between neighboring data points. Due to the unknown properties of the manifold, the identified neighbors may not lie in the locally linear patch around the point of interest. As the extent of the locally linear patch is unknown, kNN or $\epsilon$ neighborhood is chosen heuristically as the linear neighborhood. Euclidean distance is computed and used as the affinity measure between data points in the kNN or $\epsilon$ neighborhood, which is assumed to be linear. In such cases, the Euclidean distance between data points in the neighborhood fails to represent the affinity between them accurately. This requires either accurate determination of the linear region, which is difficult; or, linearization of the kNN or $\epsilon$ neighborhood.

3 Rotation-based regularization method

Manifold regularization uses the smoothness assumption that a function, f, should change slowly where the marginal probability density is high. This requires estimation of the marginal probability density. In semi-supervised learning, the unknown marginal distribution is estimated using the given data, especially the ample amount of unlabeled data. If the data points on the manifold are represented by a graph, the smoothness of the function, f, on the graph can be measured in terms of a quadratic form of the graph Laplacian. Specifically, Graph Laplacian can be used to estimate the marginal distribution. The data points are the vertices of the graph. However, a distance needs to be associated between the corresponding vertices. This entails an accurate estimation of the geodesic distance between vertices. On the Riemannian manifold, the Euclidean distance is an accurate measure of the geodesic distance in a locally linear region. Thus, the problem of determining the geodesic distance between adjacent vertices of the graph reduces to finding the locally linear region. If a small region around a data point is flattened, the Euclidean distance is an accurate measure of geodesic distance between the vertices. This leads to accurate estimation of the graph Laplacian and, through it, the underlying marginal distribution. In the proposed rotation-based regularization method, the kNN is linearized by rotating the tangent planes of data points in the neighborhood followed by the semi-supervised learning classification using the updated affinities computed between tangent space aligned data points.

3.1 Neighborhood linearization through rotation

The proposed linearization method endeavors to flatten the local neighborhood around a data point chosen through kNN data points. This is achieved by rotating the tangent planes of neighboring data points with respect to the tangent plane of the point under consideration. A local linear graph of the dataset is created by fixing the neighborhood of all the data points using kNN. The tangent plane of a point is found using local PCA. The tangent planes of all the k neighboring points are also found. Once the tangent planes are determined, the tangent plane of the point of interest is fixed and other misaligned tangent planes are rotated to align them with the fixed tangent plane of the point under consideration. This flattens the chosen neighborhood and Euclidean distance can be used as a measure of affinity between data points.

Given n data samples with l labeled and $(n - l)$ unlabeled points where $(n - l) \gg l$ on a smooth Riemannian manifold ${\mathcal {M}}$, i.e., $\{x_{i}\}_{i=1}^{n}\in {\mathbb {R}}^{D}$ that actually lie on a much lower-dimensional space $\in {\mathbb {R}}^{m}$ i.e., $m \ll D$. It can be represented by

$$\begin{aligned} f:C\subset {\mathbb {R}}^{m}\rightarrow {\mathbb {R}}^{D}, \end{aligned}$$

(3)

where C is a compact subset of ${\mathbb {R}}^{m}$ and f is the data generation function, i.e.,

$$\begin{aligned} x_{i}=f(\tau _{i})+\eta _{i}, \end{aligned}$$

(4)

where $\tau _{i}$ are original feature vectors or the lower-dimensional complement information and $\eta _{i}$ is redundant data or noise. The noise may be introduced during various stages of data collection and preprocessing and may vary with the distance.

A manifold can be approximated with a graph by using a smooth function defined on the graph. This depends on the affinity matrix W as

$$\begin{aligned} L=D-W, \end{aligned}$$

(5)

where elements of the affinity matrix W are calculated using heat kernel $w_{ij} = \frac{1}{C}\exp {-\frac{d_{ij}}{\epsilon ^2}}$, where $d_{ij}$ is the distance between points $x_i$ and $x_j$, and diagonal matrix D has entries as $D_{ii}=\sum _{j=1}^{n}w_{ij}$. kNN is used to create the undirected graph over given data points including both labeled and unlabeled ones.

Proposition 1

According to manifold learning assumption, on a manifold ${\mathcal {M}}$ , the tangent planes of the points $x_i$ and $x_k$ lying on a locally linear region are aligned.

Given a datapoint $x_i$ and its neighbor $x_k\in N(x_i)$ , the tangent planes of $x_i$ and each $x_k$ should be aligned.

$$\begin{aligned} {T_{x_{i}}} \equiv {T_{x_{k}}}. \end{aligned}$$

(6)

To find the tangent plane ${T_{x_{i}}}$ of the point $x_i$, local PCA is performed on the set of k nearest neighbors $N_k( x_i )$ of point $x_i$.

$$\begin{aligned} T_{x_i}= N_k( x_i )\cdot V, \end{aligned}$$

(7)

where V is a weight matrix and $T_{x_i}$ contains the principal component scores. The m leading eigenvectors correspond to an orthogonal basis of ${T_{x_{i}}}$.

$$\begin{aligned} T_{x_{i}}= N_k( x_i )\cdot V_m. \end{aligned}$$

(8)

It is known that on a manifold, geodesic distance is the shortest distance between any two data points which is assumed to be Euclidean if the data points are lying in a local linear region. However, since the extent of the linear region around a data point is not known, the geodesic distance may not be Euclidean in a chosen neighborhood.

To find the correct Euclidean distance between points $x_i$ and its neighborhood $x_k$, we rotate ${T_{x_{k}}}$ w.r.t. ${T_{x_{i}}}$ and align them.

$$\begin{aligned} {T_{x_{i}}} \equiv \gamma {T_{x_{k}}}, \end{aligned}$$

(9)

where $\gamma$ is the orthogonal rotation matrix calculated using Procrustes analysis [28],

$$\begin{aligned} \xi (\gamma ,\phi ,\rho ) = \sum _{x_k \in N_k( x_i )}\parallel {T_{x_{i}}} - \rho \gamma {T_{x_{k}}} - \phi \parallel, \end{aligned}$$

(10)

where $\phi$ denotes translation, $\gamma$ denotes rotation, and $\rho$ denotes scaling. In the ideal case of local linear neighborhood, $\phi$ will be a zero matrix, and $\gamma$ and $\rho$ will be unit matrices. But due to nonlinear surface, we optimize the parameters using

$$\begin{aligned} \{ {\overline{\gamma }},{\overline{\phi }},{\overline{\rho }} \} = \xi (\gamma ,\phi ,\rho ). \end{aligned}$$

(11)

This idea is depicted in Fig. 1. To give the optimal rotation ${{\overline{\gamma }}}$, eigenvalue decomposition of same centroid matrices is calculated.

$$\begin{aligned} {\overline{T}}_{x_{i}}&= {T}_{x_{i}} - \frac{1}{k}\sum _{x_j \in N_k( x_i )}{T_{x_{j}}} \nonumber \\ {\overline{T}}_{x_{k}}&= {T}_{x_{k}} - \frac{1}{k}\sum _{x_j \in N_k( x_k )}{T_{x_{j}}}, \end{aligned}$$

(12)

where k is the fixed number of neighbors. Putting these values in Eq. (10)

$$\begin{aligned} \xi ({\overline{\gamma }},{\overline{\phi }},{\overline{\rho }})&= \sum _{x_k \in N_k( x_i )}\parallel {\overline{T}}_{x_{i}} - \rho \gamma {\overline{T}}_{x_{k}} \parallel ^2 \nonumber \\&= \sum _{x_k \in N_k( x_i )}{\overline{T}}_{x_{i}}^T{\overline{T}}_{x_{i}} + {\overline{T}}_{x_{k}}^T{\overline{T}}_{x_k} - 2\rho \gamma {\overline{T}}_{x_{i}}^T{\overline{T}}_{x_{k}}. \end{aligned}$$

(13)

Let

$$\begin{aligned} P = \sum _{x_k \in N_k( x_i )}{\overline{T}}_{x_{i}}^T{\overline{T}}_{x_{k}}.\end{aligned}$$

(14)

To minimize $\xi ({\overline{\gamma }},{\overline{\phi }},{\overline{\rho }})$, the term P is maximized; if its eigenvalue decomposition is given by $\nu \omega \nu ^T$, the optimal rotation will be

$$\begin{aligned} {\overline{\gamma }} = \omega \nu ^T .\end{aligned}$$

(15)

Since finding the affinity between points $x_i$ and $x_k$ on tangent spaces is not same as affinity on manifold, we reconstruct the point $x_k$ on original space using Eq. (16),

$$\begin{aligned} x^{\prime}_k = T^{\prime}\cdot V^{-1}, \end{aligned}$$

(16)

where $T^{\prime}$ is the rotated tangent plane and V is the same weight matrix taken in Eq. (7).

$$\begin{aligned} T^{\prime}= {\overline{\gamma }}T_{x_k} .\end{aligned}$$

(17)

We calculate the Euclidean distance between data points $x_i$ and $x^{\prime}_k$ as $d^{\prime}_{ik} = \parallel x_i - x^{\prime}_k \parallel _2 ^2$.

This distance $d^{\prime}_{ik}$ is used to calculate the revised affinity matrix using

$$\begin{aligned} w^{\prime}_{ik} = \frac{1}{C}\exp {-\frac{d^{\prime}_{ik}}{\epsilon ^2}}. \end{aligned}$$

(18)

The revised affinity enforces the function smoothening in semi-supervised learning by identifying affinity between neighboring data points accurately.

3.2 Laplacian regularized least squares classifier (LapRLSC)

Let l labeled data points are given as $\{x_i,y_i\}_{i=1}^l$ and $(n - l)$ unlabeled points are given as $\{x_u\}_{u=l+1}^{n}$. The prediction function is trained using the given labeled data points [6]

$$\begin{aligned} f^* = \mathop {\mathrm {argmin}}\frac{1}{l}\sum _{i=1}^{l} \parallel y_i - f(x_i)\parallel ^2 + \lambda _A\parallel f\parallel _A^2 + \lambda _I\parallel f\parallel _I^2 ,\end{aligned}$$

(19)

where $\parallel f\parallel _A^2$ and $\parallel f\parallel _I^2$ are penalty terms in ambient space and intrinsic space, respectively. The unlabeled input data is used in prediction function by applying manifold assumption on the graph structure, considering the points in $\{x_u\}$ as nodes and the distances between them as weights. The intrinsic space regularization term R(f) is calculated using [25]

$$\begin{aligned} R(f)=\frac{1}{2}\sum _{i=k=1}^{n}(f(x_i)-f(x_k))^2w^{\prime}_{ik} ,\end{aligned}$$

(20)

where $w^{\prime}_{ik}$ is calculated using Eq. (18). Expanding Eq. (20) using Eq. (5), we get

$$\begin{aligned} R(f)&= \sum _{i=1}^{n}f(x_i)^2\sum _{k=1}^{n}w^{\prime}_{ik} - \sum _{i=k=1}^{n}w^{\prime}_{ik}f(x_i)f(x_k) \nonumber \\&= {\mathbf{f }}^TD\mathbf{f } - {\mathbf{f }}^TW{\mathbf{f }} = {\mathbf{f }}^TL{\mathbf{f }} ,\end{aligned}$$

(21)

where f is $[f(x_1),f(x_2),\dots ,f(x_n)]^T$. After putting this value in Eq. (19), we get

$$\begin{aligned} f^* = \parallel {\mathbf{Y }}_l - {\mathbf{f }}_l\parallel ^2 + \lambda _A\parallel f\parallel _A^2 + \lambda _I{\mathbf{f }}^TL{\mathbf{f }} ,\end{aligned}$$

(22)

where ${\mathbf{Y }}_l$ is vector of true labels of the labeled points. According to the classic Representer theorem [6],

$$\begin{aligned} f^* = \sum _{i=1}^{n}\alpha _i{\mathcal {K}}(x_i,x), \end{aligned}$$

(23)

where $\alpha _i$’s are representation coefficients and ${\mathcal {K}}$ is a mercer kernel. According to this, Eq. (23) can be rewritten as

$$\begin{aligned} {\mathbf{f }}&= \left[ \sum _{i=1}^{n}\alpha _i{\mathcal {K}}(x_i,x_1), \sum _{i=1}^{n}\alpha _i{\mathcal {K}}(x_i,x_2), \dots , \sum _{i=1}^{n}\alpha _i{\mathcal {K}}(x_i,x_n) \right] \nonumber \\&= {\mathbf{Ka }}, \end{aligned}$$

(24)

where ${\mathbf{K }}$ is kernel gram matrix and ${\mathbf{a }}$ is representation coefficient vector. According to kernel’s property ${\mathcal {K}}(x_i,x) = \langle \phi (x_i),\phi (x) \rangle = \phi (x_i)^T,\phi (x)$, where $\phi$ is kernel mapping, we can write the second term of Eq. (19) as

$$\begin{aligned} f = \left( \sum _{i=1}^{n}\alpha _i\phi (x_i)^T\right) = [\phi (x_1),\dots ,\phi (x_n)]{\mathbf{a }}, \end{aligned}$$

and

$$\begin{aligned} \parallel f \parallel _A^2 = {\mathbf{a }}^T{\mathbf{Ka}}. \end{aligned}$$

(25)

Putting this values from Eqs. (24) and (25) in Eq. (22), we obtain

$$\begin{aligned} f(a)^* = \parallel {\mathbf{Y }}_l - {\mathbf{K }}_l{\mathbf{a }}\parallel ^2 + \lambda _A{\mathbf{a }}^T{\mathbf{Ka} } + \lambda _I{\mathbf{a }}^T{\mathbf{K }} L{\mathbf{Ka }}^T. \end{aligned}$$

(26)

The optimum solution, after setting the partial derivative $\frac{\partial f}{\partial {\mathbf{a}} } = 0$ is obtained as

$$\begin{aligned} a^* = \left( {\mathbf{K}} _l{\mathbf{K}} _l^T + \lambda _A{\mathbf{K}} + \lambda _I{\mathbf{K}} L{\mathbf{K}} \right) ^{-1} {\mathbf{K}} _l{\mathbf{Y}} _l. \end{aligned}$$

(27)

3.3 Complexity analysis

Complexity of $rLap$ depends upon the number of data points n, their dimension d, number of nearest neighbors k, and procrustes analysis. We can calculate the complexity of $rLap$ using following steps:

1.
The kNN algorithm requires $O(nd+kn)$ time.^{Footnote 1}
2.
As PCA takes $O(d^2n+d^3)$ time, and $d \ll n$, total time complexity would be $O(d^2n)$.
3.
The upper bound for procrustes analysis is $O(d^3)$.

Thus, $rLap$ is bounded by complexity $O(n(d^2n + k(d^2n + d^3)))$. As $n \gg k$ and d, we say that complexity of $rLap$ is $O(n^2)$.

4 Experiments and results

In this section, the proposed rotation-based Laplacian regularization technique $rLap$^{Footnote 2} has been compared with existing state-of-the-art manifold learning and regularization methods on various real world and synthetic datasets. For data visualization, various 3D synthetic datasets have been projected on 2D. The performance is evaluated by comparing the intrinsic dimensional representations of all the methods. Further, real-world classification datasets have been used to train the RLSC model using all graph Laplacian variants. Their performance has been evaluated by calculating their root-mean-square error (RMSE) using Eq. (28)

$$\begin{aligned} {\mathrm{RMSE }}= \sqrt{\frac{\sum _{i=1}^{n}(\hat{y_i}-y_i)^2}{n}}, \end{aligned}$$

(28)

where $\hat{y_i}$ is predicted label.

4.1 Dimensionality reduction

In the following experiments, the proposed algorithm $rLap$ has been compared with the existing state-of-the-art manifold learning approaches including Laplacian, DM, LTSA, entropy affinity $EA$ [37], $K _5$ [32], $K _7$ [41] and min–max–mean ($MMM$) [43]. The algorithms have been applied on five synthetic datasets, namely swiss roll, swiss hole, punctured sphere, twin peaks, and elevated swiss roll, where the original 2D structure has been embedded in 3D space. The performance of proposed method on dimensionality reduction exhibits the extent to which the method is capable of preserving the local geometrical properties.

All datasets except twin peak dataset has 4000 data points. Twin peak data set consists of 7225 data points. Number of neighbors (k) is varied to find the best representation of data in lower-dimensional space. Table 1 shows the visualization results.

Swiss roll The swiss roll data is basically a 2D flat strip which is rotated as shown in the first column of Fig. 1 to make it a 3D structure. The dataset consists of 4000 points. Among all the methods, the results of $rLap$ method gives the most accurate 2D representation. Laplacian gives a considerable representation as compared to other methods using $k=6$. DM gives grossly inaccurate 2D representation exhibiting its incapability of preserving global connectivity and hence rotated strip could not be unfolded correctly.

Swiss hole The swiss hole dataset contains a hole in swiss roll dataset. The data set consists of 4000 points with a circular hole. Here, LTSA preserves the maximum intrinsic structure of the data. $rLap$ method does not retain the shape of hole. However, $rLap$ performs better than other methods, as they fail to preserve the shape of strip and give an inaccurate 2D representation with disconnected data points.

Elevated swiss roll The elevated swiss roll data is also a 2D flat strip similar to swiss roll with the varying third dimension. The 2D representation using $rLap$ method gives a better result than all other methods. This proves that $rLap$ holds properties similar to Laplacian, which can be used to exploit the intrinsic geometry of the data.

Punctured sphere The surface of a punctured sphere can be represented as a 2D flat surface. The best representation from $rLap$ method comes at $k=14$. $rLap$ outperformed all the other methods except LTSA. The corners represented by $rLap$ are smooth.

Twin peaks The twin peaks dataset containing 7225 points is originally a 2D flat surface with peaks at the two corners. $rLap$ method gives comparable results to DM, which in turn remains the best performer.

It is evident from the results that LTSA and DM cannot preserve distances and angles due to their proximity-preserving nature. $EA$ completely fails to unfold any dataset except punctured sphere. $rLap$ method attempts to preserve distances and isometry. For punctured sphere dataset, flattening curved data into flat surface violates the distance preserving criterion, still our method gives comparable results.

Table 1 Dimensionality reduction: $rLap$ versus other techniques

A rotation based regularization method for semi-supervised learning

Abstract

Similar content being viewed by others

Hyperbolic Deep Learning in Computer Vision: A Survey

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

SSCNet: learning-based subspace clustering

1 Introduction

2 Problem definition

3 Rotation-based regularization method

3.1 Neighborhood linearization through rotation

Proposition 1

3.2 Laplacian regularized least squares classifier (LapRLSC)

3.3 Complexity analysis

4 Experiments and results

4.1 Dimensionality reduction

4.2 Real-world datasets

4.3 Medical image datasets

5 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation