当前位置: X-MOL 学术arXiv.cs.HC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ShapeVis: High-dimensional Data Visualization at Scale
arXiv - CS - Human-Computer Interaction Pub Date : 2020-01-15 , DOI: arxiv-2001.05166
Nupur Kumari, Siddarth R., Akash Rupela, Piyush Gupta, Balaji Krishnamurthy

We present ShapeVis, a scalable visualization technique for point cloud data inspired from topological data analysis. Our method captures the underlying geometric and topological structure of the data in a compressed graphical representation. Much success has been reported by the data visualization technique Mapper, that discreetly approximates the Reeb graph of a filter function on the data. However, when using standard dimensionality reduction algorithms as the filter function, Mapper suffers from considerable computational cost. This makes it difficult to scale to high-dimensional data. Our proposed technique relies on finding a subset of points called landmarks along the data manifold to construct a weighted witness-graph over it. This graph captures the structural characteristics of the point cloud, and its weights are determined using a Finite Markov Chain. We further compress this graph by applying induced maps from standard community detection algorithms. Using techniques borrowed from manifold tearing, we prune and reinstate edges in the induced graph based on their modularity to summarize the shape of data. We empirically demonstrate how our technique captures the structural characteristics of real and synthetic data sets. Further, we compare our approach with Mapper using various filter functions like t-SNE, UMAP, LargeVis and show that our algorithm scales to millions of data points while preserving the quality of data visualization.

中文翻译:

ShapeVis:大规模高维数据可视化

我们展示了 ShapeVis,这是一种受拓扑数据分析启发的点云数据的可扩展可视化技术。我们的方法以压缩的图形表示形式捕获数据的底层几何和拓扑结构。数据可视化技术 Mapper 已经取得了很大的成功,该技术谨慎地近似了数据过滤器函数的 Reeb 图。然而,当使用标准的降维算法作为过滤函数时,Mapper 会遭受相当大的计算成本。这使得难以扩展到高维数据。我们提出的技术依赖于沿数据流形找到称为地标的点子集,以在其上构建加权见证图。该图捕捉了点云的结构特征,其权重是使用有限马尔可夫链确定的。我们通过应用来自标准社区检测算法的诱导图进一步压缩该图。使用从流形撕裂中借来的技术,我们根据它们的模块化修剪和恢复诱导图中的边以总结数据的形状。我们凭经验证明了我们的技术如何捕捉真实和合成数据集的结构特征。此外,我们使用各种过滤器函数(如 t-SNE、UMAP、LargeVis)将我们的方法与 Mapper 进行了比较,并表明我们的算法可扩展到数百万个数据点,同时保持数据可视化的质量。我们根据它们的模块化修剪和恢复诱导图中的边以总结数据的形状。我们凭经验证明了我们的技术如何捕捉真实和合成数据集的结构特征。此外,我们使用各种过滤器函数(如 t-SNE、UMAP、LargeVis)将我们的方法与 Mapper 进行了比较,并表明我们的算法可扩展到数百万个数据点,同时保持数据可视化的质量。我们根据它们的模块化修剪和恢复诱导图中的边以总结数据的形状。我们凭经验证明了我们的技术如何捕捉真实和合成数据集的结构特征。此外,我们使用各种过滤器函数(如 t-SNE、UMAP、LargeVis)将我们的方法与 Mapper 进行了比较,并表明我们的算法可扩展到数百万个数据点,同时保持数据可视化的质量。
更新日期:2020-01-22
down
wechat
bug