当前位置: X-MOL 学术Pattern Anal. Applic. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
IM-c-means: a new clustering algorithm for clusters with skewed distributions
Pattern Analysis and Applications ( IF 3.9 ) Pub Date : 2020-11-06 , DOI: 10.1007/s10044-020-00932-2
Yun Liu , Tao Hou , Yan Miao , Meihe Liu , Fu Liu

In this paper, a new clustering algorithm, IM-c-means, is proposed for clusters with skewed distributions. C-means algorithm is a well-known and widely used strategy for data clustering, but at the same time prone to poor performance if the data set is not distributed uniformly, which is called “uniform effect” in studies. We first analyze the cause of this effect and find that it occurs only when clusters sizes are varied, whereas different object densities inter-clusters have no effect on c-means algorithm. According to this finding, we propose to form a new objective function by considering volumes and object densities of all clusters, which creates a new effective clustering algorithm with respect to the clusters with varied sizes or densities, while at the same time inheriting the good performance of traditional c-means algorithm for balanced data set. The experiments using both synthetic and real data sets have provided promising results of the proposed clustering algorithm. In addition, the nonparametric test has showed that the proposed algorithm could offer a significant improvement over other clustering methods for imbalanced data sets.



中文翻译:

IM-c-means:一种新的群集算法,用于具有偏斜分布的群集

本文针对分布偏斜的聚类提出了一种新的聚类算法IM-c-means。C均值算法是一种众所周知的数据聚类策略,但是如果数据集分布不均匀,则容易导致性能下降,在研究中称为“均匀效应”。我们首先分析这种影响的原因,发现仅在簇大小变化时才会发生,而不同的对象密度簇间对象对c-means算法没有影响。根据这一发现,我们建议通过考虑所有聚类的体积和对象密度来形成一个新的目标函数,从而针对具有不同大小或密度的聚类创建一种新的有效聚类算法,同时继承了传统c均值算法在平衡数据集中的良好性能。使用合成数据集和真实数据集进行的实验都为提出的聚类算法提供了有希望的结果。此外,非参数测试表明,所提出的算法与不平衡数据集的其他聚类方法相比可以提供显着的改进。

更新日期:2020-11-09
down
wechat
bug