当前位置: X-MOL 学术IEEE Access › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Single- and Multi-Distribution Dimensionality Reduction Approaches for a Better Data Structure Capturing
IEEE Access ( IF 3.9 ) Pub Date : 2020-01-01 , DOI: 10.1109/access.2020.3038460
Laureta Hajderanj , Daqing Chen , Enrico Grisan , Sandra Dudley

In recent years, the huge expansion of digital technologies has vastly increased the volume of data to be explored, such that reducing the dimensionality of data is an essential step in data exploration. The integrity of a dimensionality reduction technique relates to the goodness of maintaining the data structure. Dimensionality reduction techniques such as Principal Component Analyses (PCA) and Multidimensional Scaling (MDS) globally preserve the distance ranking at the expense of neglecting small-distance preservation. Conversely, the structure capturing of some other methods such as Isomap, Locally Linear Embedding (LLE), Laplacian Eigenmaps ${t}$ -Stochastic Neighbour Embedding (t-SNE), Uniform Manifold Approximation and Projection (UMAP), and TriMap rely on the number of neighbours considered. This paper presents a dimensionality reduction technique, Same Degree Distribution (SDD) that does not rely on the number of neighbours, thanks to using degree-distributions in both high and low dimensional spaces. Degree-distribution is similar to Student-t distribution and is less expensive than Gaussian distribution. As such, it enables better global data preservation in less processing time. Moreover, to improve the data structure capturing, SDD has been extended to Multi-SDD s (MSDD), which employs various degree-distributions on top of SDD. The proposed approach and its extension demonstrated a greater performance compared with eight other benchmark methods, tested in several popular synthetics and real datasets such as Iris, Breast Cancer, Swiss Roll, MNIST, and Make Blob evaluated by the co-ranking matrix and Kendall’s Tau coefficient. For further work, we aim to approximate the number of distributions and their degrees in relation to the given dataset. Reducing the computational complexity is another objective for further work.

中文翻译:

用于更好的数据结构捕获的单分布和多分布降维方法

近年来,数字技术的巨大发展极大地增加了待探索的数据量,因此降低数据的维数是数据探索中必不可少的一步。降维技术的完整性与维护数据结构的优劣有关。诸如主成分分析 (PCA) 和多维缩放 (MDS) 之类的降维技术以忽略小距离保留为代价来全局保留距离排名。相反,其他一些方法如 Isomap、Locally Linear Embedding (LLE)、Laplacian Eigenmaps ${t}$ -Stochastic Neighbor Embedding (t-SNE)、Uniform Manifold Approximation and Projection (UMAP) 和 TriMap 的结构捕获依赖于考虑的邻居数量。由于在高维空间和低维空间中都使用了度分布,本文提出了一种降维技术,即不依赖于邻居数量的相同度分布 (SDD)。度分布类似于学生 t 分布,并且比高斯分布便宜。因此,它可以在更短的处理时间内实现更好的全局数据保存。此外,为了改进数据结构捕获,SDD 已扩展到多 SDD (MSDD),它在 SDD 之上采用各种度分布。与其他八种基准方法相比,所提出的方法及其扩展表现出更高的性能,并在几个流行的合成数据和真实数据集(如 Iris、乳腺癌、瑞士卷、MNIST 和 Make Blob 评估的联合排名矩阵和 Kendall's Tau系数。对于进一步的工作,我们的目标是估计与给定数据集相关的分布数量及其程度。降低计算复杂度是进一步工作的另一个目标。
更新日期:2020-01-01
down
wechat
bug