当前位置: X-MOL 学术J. Comput. Graph. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dynamic Visualization and Fast Computation for Convex Clustering via Algorithmic Regularization
Journal of Computational and Graphical Statistics ( IF 2.4 ) Pub Date : 2019-07-19 , DOI: 10.1080/10618600.2019.1629943
Michael Weylandt 1 , John Nagorski 1 , Genevera I. Allen 1, 2, 3, 4
Affiliation  

Abstract Convex clustering is a promising new approach to the classical problem of clustering, combining strong performance in empirical studies with rigorous theoretical foundations. Despite these advantages, convex clustering has not been widely adopted, due to its computationally intensive nature and its lack of compelling visualizations. To address these impediments, we introduce Algorithmic Regularization, an innovative technique for obtaining high-quality estimates of regularization paths using an iterative one-step approximation scheme. We justify our approach with a novel theoretical result, guaranteeing global convergence of the approximate path to the exact solution under easily checked non-data-dependent assumptions. The application of algorithmic regularization to convex clustering yields the Convex Clustering via Algorithmic Regularization Paths (CARP) algorithm for computing the clustering solution path. On example datasets from genomics and text analysis, CARP delivers over a 100-fold speed-up over existing methods, while attaining a finer approximation grid than standard methods. Furthermore, CARP enables improved visualization of clustering solutions: the fine solution grid returned by CARP can be used to construct a convex clustering-based dendrogram, as well as forming the basis of a dynamic path-wise visualization based on modern web technologies. Our methods are implemented in the open-source R package clustRviz, available at https://github.com/DataSlingers/clustRviz. Supplementary materials for this article are available online.

中文翻译:

通过算法正则化凸聚类的动态可视化和快速计算

摘要 凸聚类是解决经典聚类问题的一种很有前途的新方法,它将实证研究中的强大性能与严谨的理论基础相结合。尽管有这些优点,但由于其计算密集型和缺乏引人注目的可视化,凸聚类尚未被广泛采用。为了解决这些障碍,我们引入了算法正则化,这是一种使用迭代一步近似方案获得正则化路径的高质量估计的创新技术。我们用一个新的理论结果证明我们的方法是合理的,在容易检查的非数据依赖假设下,保证了精确解的近似路径的全局收敛。算法正则化在凸聚类中的应用产生了通过算法正则化路径(CARP)算法计算聚类解路径的凸聚类。在来自基因组学和文本分析的示例数据集上,CARP 的速度比现有方法快 100 倍,同时获得比标准方法更精细的近似网格。此外,CARP 可以改进聚类解决方案的可视化:CARP 返回的精细解决方案网格可用于构建基于凸聚类的树状图,以及形成基于现代网络技术的动态路径可视化的基础。我们的方法在开源 R 包 clustRviz 中实现,可从 https://github.com/DataSlingers/clustRviz 获得。本文的补充材料可在线获取。
更新日期:2019-07-19
down
wechat
bug