当前位置: X-MOL 学术Cognit. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Clustering Ensemble Based on Sample’s Certainty
Cognitive Computation ( IF 5.4 ) Pub Date : 2021-05-25 , DOI: 10.1007/s12559-021-09876-z
Xia Ji , Shuaishuai Liu , Peng Zhao , Xuejun Li , Qiong Liu

The objective of clustering ensemble is to fuse multiple base partitions (BPs) to find the underlying data structure. It has been observed that sample can change its neighbors in different BPs and different samples have different relationship stability of sample. This difference shows that samples may have different contributions to the detection of underlying data structure. In addition, clustering ensemble aims to integrate the inconsistent parts of BPs by initially extracting the consistent parts. However, the existing clustering ensemble methods treat all samples equally. They neither consider sample relationship stability nor whether sample belongs to the consistent result or the inconsistent result in BPs. To tackle these deficiencies, we introduce the certainty of a sample to qualify its neighbor relationship stability and propose a formula to calculate this certainty. Then, we develop a clustering ensemble algorithm based on the sample’s certainty. It is based on the following idea: the neighbor relationship of cluster core in BPs is more stable, and different cluster cores usually do not form neighbor relationships in BPs. This idea forms the basis of the clustering ensemble process. According to the sample’s certainty, this algorithm divides a dataset into two subsets: cluster core samples and cluster halo samples. Then, the proposed algorithm discovers a clear core structure using cluster core samples and gradually assigns cluster halo samples to the core structure. The experiments on six synthetic datasets illustrate how our algorithm works. This algorithm has excellent performance and outperforms twelve state-of-the-art clustering ensemble algorithms on twelve real datasets.



中文翻译:

基于样本确定性的聚类集成

集群集成的目的是融合多个基本分区(BP)以找到基础数据结构。已经观察到,样品可以在不同的BP中改变其邻居,并且不同的样品具有不同的样品关系稳定性。这种差异表明样本可能对基础数据结构的检测有不同的贡献。此外,聚类集成旨在通过首先提取一致部分来整合BP的不一致部分。但是,现有的聚类集成方法均等地对待所有样本。他们既没有考虑样本关系的稳定性,也没有考虑样本是属于BP中的一致结果还是不一致结果。为了解决这些不足,我们介绍了样本的确定性以限定其邻居关系的稳定性,并提出了计算该确定性的公式。然后,基于样本的确定性,开发了一种聚类集成算法。它基于以下思想:BP中群集核心的邻居关系更加稳定,并且不同的群集核心通常在BP中不形成邻居关系。这个想法构成了聚类集成过程的基础。根据样本的确定性,此算法将数据集分为两个子集:聚类核心样本和聚类光晕样本。然后,所提出的算法使用聚类核心样本发现清晰的核心结构,并逐渐将聚类光晕样本分配给核心结构。在六个综合数据集上进行的实验说明了我们的算法是如何工作的。

更新日期:2021-05-26
down
wechat
bug