Degrees of freedom and model selection for k-means clustering,Computational Statistics & Data Analysis

当前位置： X-MOL 学术 › Comput. Stat. Data Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Degrees of freedom and model selection for k-means clustering
Computational Statistics & Data Analysis ( IF 1.5 ) Pub Date : 2020-09-01 , DOI: 10.1016/j.csda.2020.106974
David P. Hofmeyr

This paper investigates the model degrees of freedom in k-means clustering. An extension of Stein's lemma provides an expression for the effective degrees of freedom in the k-means model. Approximating the degrees of freedom in practice requires simplifications of this expression, however empirical studies evince the appropriateness of our proposed approach. The practical relevance of this new degrees of freedom formulation for k-means is demonstrated through model selection using the Bayesian Information Criterion. The reliability of this method is validated through experiments on simulated data as well as on a large collection of publicly available benchmark data sets from diverse application areas. Comparisons with popular existing techniques indicate that this approach is extremely competitive for selecting high quality clustering solutions. Code to implement the proposed approach is available in the form of an R package from this https URL.

中文翻译：

k-means聚类的自由度和模型选择

本文研究了 k-means 聚类中的模型自由度。Stein 引理的扩展提供了 k 均值模型中有效自由度的表达式。在实践中逼近自由度需要简化这个表达式，但是实证研究表明我们提出的方法是合适的。通过使用贝叶斯信息准则进行模型选择，证明了这种新的 k 均值自由度公式的实际相关性。通过对模拟数据以及来自不同应用领域的大量公开可用基准数据集的实验，验证了该方法的可靠性。与流行的现有技术的比较表明，这种方法在选择高质量聚类解决方案方面极具竞争力。实现所提议方法的代码可以从这个 https URL 以 R 包的形式提供。

更新日期：2020-09-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11