Computational Statistics ( IF 1.3 ) Pub Date : 2020-05-18 , DOI: 10.1007/s00180-020-00981-5 Jonas M. B. Haslbeck , Dirk U. Wulff
We improve instability-based methods for the selection of the number of clusters k in cluster analysis by developing a corrected clustering distance that corrects for the unwanted influence of the distribution of cluster sizes on cluster instability. We show that our corrected instability measure outperforms current instability-based measures across the whole sequence of possible k, overcoming limitations of current insability-based methods for large k. We also compare, for the first time, model-based and model-free approaches to determining cluster-instability and find their performance to be comparable. We make our method available in the R-package cstab.
中文翻译:
通过更正的聚类不稳定性估计聚类数
我们通过开发校正的聚类距离来纠正基于聚类分析的聚类数k的基于不稳定性的方法,该聚类距离可以校正聚类大小分布对聚类不稳定性的有害影响。我们表明,在可能的k整个序列中,我们校正后的不稳定性测度优于基于当前不稳定性的测度,克服了针对大k的基于当前基于不稳定性的方法的局限性。我们还首次比较了基于模型的方法和无模型的方法来确定集群的不稳定性,并发现它们的性能可比。我们在R-package cstab中提供了我们的方法。