Sparse probabilistic K-means,Applied Mathematics and Computation

当前位置： X-MOL 学术 › Appl. Math. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Sparse probabilistic K-means
Applied Mathematics and Computation ( IF 3.5 ) Pub Date : 2020-10-01 , DOI: 10.1016/j.amc.2020.125328
Yoon Mo Jung , Joyce Jiyoung Whang , Sangwoon Yun

Abstract The goal of clustering is to partition a set of data points into groups of similar data points, called clusters. Clustering algorithms can be classified into two categories: hard and soft clustering. Hard clustering assigns each data point to one cluster exclusively. On the other hand, soft clustering allows probabilistic assignments to clusters. In this paper, we propose a new model which combines the benefits of these two models: clarity of hard clustering and probabilistic assignments of soft clustering. Since the majority of data usually have a clear association, only a few points may require a probabilistic interpretation. Thus, we apply the l1 norm constraint to impose sparsity on probabilistic assignments. Moreover, we also incorporate outlier detection in our clustering model to simultaneously detect outliers which can cause serious problems in statistical analyses. To optimize the model, we introduce an alternating minimization method and prove its convergence. Numerical experiments and comparisons with existing models show the soundness and effectiveness of the proposed model.

中文翻译：

稀疏概率 K 均值

摘要聚类的目标是将一组数据点划分为一组相似的数据点，称为簇。聚类算法可以分为两类：硬聚类和软聚类。硬聚类将每个数据点专门分配给一个聚类。另一方面，软聚类允许对聚类进行概率分配。在本文中，我们提出了一种新模型，它结合了这两种模型的优点：硬聚类的清晰度和软聚类的概率分配。由于大多数数据通常具有明确的关联，因此可能只有少数点需要概率解释。因此，我们应用 l1 范数约束来对概率分配施加稀疏性。而且，我们还在我们的聚类模型中加入了异常值检测，以同时检测可能在统计分析中导致严重问题的异常值。为了优化模型，我们引入了一种交替最小化方法并证明了它的收敛性。数值实验和与现有模型的比较表明了所提出模型的合理性和有效性。

更新日期：2020-10-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11