当前位置: X-MOL 学术J. Comput. Graph. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Estimating the number of clusters using cross-validation
Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2019-09-30 , DOI: 10.1080/10618600.2019.1647846
Wei Fu 1 , Patrick O. Perry 1
Affiliation  

Abstract Many clustering methods, including k-means, require the user to specify the number of clusters as an input parameter. A variety of methods have been devised to choose the number of clusters automatically, but they often rely on strong modeling assumptions. This article proposes a data-driven approach to estimate the number of clusters based on a novel form of cross-validation. The proposed method differs from ordinary cross-validation, because clustering is fundamentally an unsupervised learning problem. Simulation and real data analysis results show that the proposed method outperforms existing methods, especially in high-dimensional settings with heterogeneous or heavy-tailed noise. In a yeast cell cycle dataset, the proposed method finds a parsimonious clustering with interpretable gene groupings. Supplementary materials for this article are available online.

中文翻译:

使用交叉验证估计集群的数量

Abstract 许多聚类方法,包括k-means,都要求用户指定聚类数作为输入参数。已经设计了多种方法来自动选择集群的数量,但它们通常依赖于强大的建模假设。本文提出了一种数据驱动的方法,以基于一种新颖的交叉验证形式来估计集群的数量。所提出的方法不同于普通的交叉验证,因为聚类本质上是一个无监督的学习问题。仿真和真实数据分析结果表明,所提出的方法优于现有方法,尤其是在具有异构或重尾噪声的高维设置中。在酵母细胞周期数据集中,所提出的方法找到了具有可解释基因分组的简约聚类。
更新日期:2019-09-30
down
wechat
bug