当前位置: X-MOL 学术Big Data Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Scaling the Growing Neural Gas for Visual Cluster Analysis
Big Data Research ( IF 3.3 ) Pub Date : 2021-08-12 , DOI: 10.1016/j.bdr.2021.100254
Elio Ventocilla 1 , Rafael M. Martins 2 , Fernando Paulovich 3 , Maria Riveiro 4
Affiliation  

The growing neural gas (GNG) is an unsupervised topology learning algorithm that models a data space through interconnected units that stand on the most populated areas of that space. Its output is a graph that can be visually represented on a two-dimensional plane, disclosing cluster patterns in datasets. It is common, however, for GNG to result in highly connected graphs when trained on high-dimensional data, which in turn leads to highly cluttered 2D representations that may fail to disclose meaningful patterns. Moreover, its sequential learning limits its potential for faster executions on local datasets, and, more importantly, its potential for training on distributed datasets while leveraging from the computational resources of the infrastructures in which they reside.

This paper presents two methods that improve GNG for the visualization of cluster patterns in large-scale and high-dimensional datasets. The first one focuses on providing more accurate and meaningful 2D visual representations for cluster patterns of high-dimensional datasets, by avoiding connections that lead to high-dimensional graphs in the modeled topology which may, in turn, result in overplotting and clutter. The second method presented in this paper enables the use of GNG on big and distributed datasets with faster execution times, by modeling and merging separate parts of a dataset using the MapReduce model.

Quantitative and qualitative evaluations show that the first method leads to the creation of lower-dimensional graph structures that provide more meaningful (and sometimes more accurate) cluster representations with less overplotting and clutter; and that the second method preserves the accuracy and meaning of the cluster representations while enabling its execution in large-scale and distributed settings.



中文翻译:

为视觉聚类分析缩放不断增长的神经气体

不断增长的神经气体 (GNG) 是一种无监督的拓扑学习算法,它通过位于该空间中人口最多的区域的互连单元对数据空间进行建模。它的输出是一个可以在二维平面上直观表示的图形,揭示数据集中的集群模式。然而,GNG 在对高维数据进行训练时产生高度连接的图是很常见的,这反过来会导致高度混乱的 2D 表示可能无法揭示有意义的模式。此外,它的顺序学习限制了它在本地数据集上更快执行的潜力,更重要的是,它在利用分布式数据集所在基础设施的计算资源的同时对其进行训练的潜力。

本文提出了两种改进 GNG 的方法,用于大规模和高维数据集中聚类模式的可视化。第一个侧重于为高维数据集的集群模式提供更准确和有意义的 2D 视觉表示,通过避免导致建模拓扑中的高维图的连接,这反过来可能导致过度绘制和混乱。本文中介绍的第二种方法通过使用 MapReduce 模型对数据集的不同部分进行建模和合并,从而能够在具有更快执行时间的大型分布式数据集上使用 GNG。

定量和定性评估表明,第一种方法导致创建低维图结构,提供更有意义(有时更准确)的集群表示,减少过度绘制和混乱;并且第二种方法保留了集群表示的准确性和意义,同时使其能够在大规模和分布式设置中执行。

更新日期:2021-08-27
down
wechat
bug