K-CDFs: A Nonparametric Clustering Algorithm via Cumulative Distribution Function,Journal of Computational and Graphical Statistics

当前位置： X-MOL 学术 › J. Comput. Graph. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

K-CDFs: A Nonparametric Clustering Algorithm via Cumulative Distribution Function
Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2022-07-21 , DOI: 10.1080/10618600.2022.2091575
Jicai Liu ₁ , Jinhong Li ₂ , Riquan Zhang ₂

Affiliation

ABSTRACT

We propose a novel partitioning clustering procedure based on the cumulative distribution function (CDF), called K-CDFs. For univariate data, the K-CDFs represent the cluster centers by empirical CDFs and assign each observation to the closest center measured by the Cram $\overset{´}{e}$ r-von Mises distance. The procedure is nonparametric and does not require assumptions on cluster distributions imposed by mixture models. A projection technique is used to generalize the K-CDFs for univariate data to an arbitrary dimension. The proposed procedure has several appealing properties. It is robust to heavy-tailed data, is not sensitive to the data dimensions, does not require moment conditions on data and can effectively detect linearly nonseparable clusters. To implement the K-CDFs, we propose two kinds of algorithms: a greedy algorithm as the classical Lloyd’s algorithm and a spectral relaxation algorithm. We illustrate the finite sample performance of the proposed algorithms through simulation experiments and empirical analyses of several real datasets. Supplementary files for this article are available online.

中文翻译：

K-CDFs：一种基于累积分布函数的非参数聚类算法

摘要

我们提出了一种基于累积分布函数 (CDF) 的新型分区聚类程序，称为K -CDF。对于单变量数据，K -CDF 通过经验 CDF 表示聚类中心，并将每个观察值分配给由 Cram 测量的最近中心 $\overset{´}{电子}$ r-von Mises 距离。该过程是非参数的，不需要对混合模型强加的聚类分布进行假设。投影技术用于将单变量数据的K -CDF 泛化到任意维度。拟议的程序有几个吸引人的特性。它对重尾数据具有鲁棒性，对数据维度不敏感，对数据不需要矩条件，可以有效地检测线性不可分的簇。实施K-CDFs，我们提出了两种算法：贪婪算法如经典劳埃德算法和谱松弛算法。我们通过模拟实验和几个真实数据集的实证分析来说明所提出算法的有限样本性能。本文的补充文件可在线获取。

更新日期：2022-07-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11