当前位置: X-MOL 学术arXiv.stat.OT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Interactive Exploration of Large Dendrograms with Prototypes
arXiv - STAT - Other Statistics Pub Date : 2022-06-03 , DOI: arxiv-2206.01703
Andee Kaplan, Jacob Bien

Hierarchical clustering is one of the standard methods taught for identifying and exploring the underlying structures that may be present within a data set. Students are shown examples in which the dendrogram, a visual representation of the hierarchical clustering, reveals a clear clustering structure. However, in practice, data analysts today frequently encounter data sets whose large scale undermines the usefulness of the dendrogram as a visualization tool. Densely packed branches obscure structure, and overlapping labels are impossible to read. In this paper we present a new workflow for performing hierarchical clustering via the R package called protoshiny that aims to restore hierarchical clustering to its former role of being an effective and versatile visualization tool. Our proposal leverages interactivity combined with the ability to label internal nodes in a dendrogram with a representative data point (called a prototype). After presenting the workflow, we provide three case studies to demonstrate its utility.

中文翻译:

使用原型交互式探索大型树状图

层次聚类是用于识别和探索数据集中可能存在的底层结构的标准方法之一。向学生展示了树状图(层次聚类的可视化表示)揭示了清晰的聚类结构的示例。然而,在实践中,今天的数据分析师经常遇到大规模破坏树状图作为可视化工具的有用性的数据集。密集的分支结构模糊不清,重叠的标签无法阅读。在本文中,我们提出了一个新的工作流程,用于通过称为 protoshiny 的 R 包执行层次聚类,旨在将层次聚类恢复到其作为有效和多功能可视化工具的先前角色。我们的提议利用了交互性以及使用代表性数据点(称为原型)在树状图中标记内部节点的能力。在展示了工作流程之后,我们提供了三个案例研究来展示它的实用性。
更新日期:2022-06-06
down
wechat
bug