Reliability-based fuzzy clustering ensemble

doi:10.1016/j.fss.2020.03.008

Fuzzy Sets and Systems

Volume 413, 15 June 2021, Pages 1-28

https://doi.org/10.1016/j.fss.2020.03.008 Get rights and content

Abstract

In the clustering ensemble the quality of base-clusterings influences the consensus clustering. Although some researches have been devoted to weighting the base-clustering, fuzzy cluster level weighting has been ignored, more specifically, they did not pay attention to the role of cluster reliability in the fuzzy clustering ensemble. In this paper, we propose a new fuzzy clustering ensemble framework without access to the features of data-objects based on fuzzy cluster-level weighting. The reliability of each fuzzy cluster is computed based on estimation of its unreliability, and is considered as its weight in the ensemble. The unreliability of fuzzy clusters is estimated by applying the similarity between fuzzy clusters in the ensemble based on an entropic criterion. In our framework, the final clustering is produced by two types of consensus functions: (1) a reliability-based weighted fuzzy co-association matrix is constructed from the base-clusterings and then, a single traditional clustering such as hierarchical agglomerative clustering or K-means is applied over the matrix to produce the final clustering. (2) a new graph based fuzzy consensuses function. The graph based consensus function has linear time complexity in the number of data-objects. Experimental results on various standard datasets demonstrated the effectiveness of the proposed approach compared to the state-of-the-art methods in terms of evaluation criteria and clustering robustness.

Introduction

Clustering is the process of partitioning a set of data-objects (samples) into some (K) subsets of data-objects based on a similarity (distance) measure, where the data-objects in each subset are more similar to one another, and more separate than other subsets of data-objects. Each subset in the mentioned definition is usually referred to as a cluster. All clusters together are named a clustering. Based on the relationship of each data-object to the clusters, the clustering algorithms can also be divided into crisp and fuzzy clustering algorithms. In crisp clustering a data object definitely belongs to one cluster. In fuzzy clustering, data-objects are assigned to every cluster with a membership degree. Crisp clustering is a special case of fuzzy clustering, in which the membership degree of a data-object belonging to a cluster equals to one and its membership degree belonging to the other clusters is zero.

In general, in data clustering context, various clustering algorithms have emerged, each uses a different similarity criterion. Therefore, they have different objective functions. All these methods are heavily dependent on dataset; in other words, there is no clustering algorithm that can learn every dataset [1]. Hence, data clustering with the help of an ensemble of clusters has been proposed as a technique for resolving the aforementioned problems in recent years by researchers [2], [3], [4], [5]. This technique is named clustering ensemble. The main aim of clustering ensemble is to search for a likely better and more stable result with the aggregation of the information extracted from multiple clusterings (also called base-clusterings or members) [6], [7]. The better and more robust result that is extracted from base-clusterings is named consensus clustering (which in this research is also referred to as the final clustering) [6], [8].

In summary, as observed in Fig. 1, a clustering ensemble consists of the following two phases [6]: (1) Base-clustering generation phase: Produce base-clusterings through single clustering algorithms (in this study single clustering is used versus ensemble clustering). (2) Base-clustering consolidation: In this phase the base-clusterings generated in phase 1 must be combined in order to generate the final clustering, which is the objective of this phase. This consolidation is done through a consensus function. It is worth mentioning that this paper focuses on this phase, and proposes a co-association based fuzzy clustering ensemble method and a graph based fuzzy clustering ensemble method.

Despite the greater generality of fuzzy clustering compared to crisp clustering, researches in fuzzy cluster ensemble are still in the initial stages and there exist relatively few approaches for this field. Some of the existing fuzzy cluster ensemble methods convert fuzzy clusters into hard clusters at first, and then compute the final clustering through the hard consensus functions, which causes the loss of uncertainty information. Therefore, proposing an efficient fuzzy consensus clustering from multiple fuzzy base-clusterings remains a challenging issue.

In the ensemble of voters (learners) assigning weight to each learner based on its quality can be an effective mechanism to improve the result of the ensemble. The process of weighting to learners in an ensemble of learners can be optimally set if the accuracy of each learner is known [4]. But if the learners are of type clustering algorithms, the accuracy for learner is meaningless [9]. So, the quality of clustering that is obtained by clustering algorithm can be used as an approximation for its (clustering algorithm) accuracy. In summary, the quality of the base-clusterings highly affects the consensus clustering (final clustering) obtained by the ensemble. In other words, low-quality base-clusterings may have a negative influence on the consensus results. Some researchers investigated the quality-evaluation and weighting of the base-clusterings to improve the consensus clustering quality [10], [11], [12]. For a method that used weighting mechanism in fuzzy clustering ensemble, we can refer to a paper by Berikov [13]. However, this approach assumed that all of the clusters in the same base-clustering have the same reliability; They typically treat each base clustering as an individual and assign a weight to them regardless of the diversity of the clusters inside [10], [11], [12], [13]. In this research, reliability is defined as the quantity of certain knowledge of the ensemble about the cluster and is computed by the accretion amount of that cluster by the ensemble. Briefly, in the aforementioned papers weighting is considered at the clustering level not in the cluster level. But due to the inherent complexity of real-world datasets, the different clusters in the same clustering may have different reliability. Hence, it is necessary to consider the local diversity of ensembles (quality of the clusters in the ensemble) and deal with the different reliability of clusters. Although Zhong et al. investigated the reliability of crisp clusters by considering the Euclidean distances between data objects in clusters [14], this method is not operational for fuzzy clustering, in addition, it requires access to the original data features, and its efficacy relies heavily on the data distribution of the dataset, while in the general formulation of cluster ensemble, there exists no access to the original data features. Therefore, without the need to access the data features or rely on specific assumptions made on data distribution, the key question here is how to measure the reliability of fuzzy clusters and weight them accordingly to enhance the accuracy and robustness of the consensus clustering. In other words, the problem that must be solved here is how to compute the reliability of each fuzzy cluster as a fuzzy cluster quality measure and incorporate it into a weighting structure for boosting the consensus clustering.

In light of this, we propose a new fuzzy clustering ensemble framework based on ensemble-driven cluster reliability and local weighting strategy framework; we assign a weight to each cluster based on its reliability value. The contributions of this article are as follows:

•
A method is proposed to estimate the unreliability of fuzzy clusters in relation to a clustering by considering the membership degree of all data-objects to the clusters by applying an entropic criterion, which requires no access to the original data features.
•
A reliability-driven cluster indicator is proposed to measure the reliability of the fuzzy clusters in the ensemble and consider it as the weight of the fuzzy clusters in the ensemble.
•
A method is proposed to compute the reliability-based fuzzy co-association matrix in the fuzzy clustering ensemble.
•
Applying three single clustering algorithms on the obtained co-association matrix and showing their effects on the quality of the proposed approach on a variety of datasets.
•
A reliability-based graph consensus function is proposed whose time complexity is linear in the number of data-objects.
•
Extensive experiments performed on a variety of datasets indicate that this proposed fuzzy clustering ensemble approach performs better than the state-of-the-art approaches in terms of clustering quality.

The rest of this work is structured as follows: Sec. 2 presents a review of related work. The formal background knowledge about ensemble clustering is presented in Sec. 3. The proposed fuzzy clustering ensemble approach is explained in Sec. 4. We show the experimental results in Sec. 5 and the conclusion and future work are presented in Sec. 6.

Section snippets

Related work

Considerable research efforts have been performed in the field of crisp-clustering ensemble. Here we briefly review existing work most related to fuzzy clustering ensemble.

sCSPA, sHBGF and sMCLA proposed by Punera and Ghosh [15] can be assumed as the starting points in the fuzzy clustering ensemble. sCSPA which is the soft (fuzzy) version of CSPA (cluster-based similarity partitioning algorithm) [2], constructs a graph of all data-objects where edges are weighted by pair-wise similarities. For

Preliminaries

In this section, the general formulation of the dataset, some notations of fuzzy clustering ensemble used in this paper are introduced.

Definition 1

A dataset X contains N data-object in the form of $X = {x_{1}, x_{2}, \dots, x_{N}}$ , where each data-object contains M features.

Definition 2

A fuzzy clustering (partition) of data set X is a two-dimensional matrix with size $N ⁎ K$ , where N is the number of data-objects and K is the number of clusters, presented as $F (X)$ so that: $\forall t \forall i, t \in {1, \dots, N} and i \in {1, \dots, K} : F_{i} (x_{t}) \in [0, 1] and \sum_{i = 1}^{K} F_{i} (x_{t}) = 1$ where $F_{i} (x_{t}$

Proposed approach

In this paper, a new fuzzy clustering ensemble approach based on ensemble-driven cluster unreliability estimation and local weighting strategy is proposed. The main idea of our proposed approach is utilizing a weighting scheme at the fuzzy cluster level in which high-quality fuzzy clusters in the ensemble have more influence on the final clustering production. The fuzzy cluster quality is considered as fuzzy cluster reliability, and is defined by applying the concept of entropy over the

Experiments

The goal of the experimental section of this study is to answer the following questions:

•
Can the proposed approach compete with the state-of-the-art ensemble clustering algorithms?
•
How does changing the input parameters of the proposed approach influence the performance of the final clustering?

Conclusion and future work

This paper proposes a novel fuzzy clustering ensemble approach based on fuzzy-cluster-level weighting. The quantity of certain knowledge of the ensemble about the fuzzy cluster is considered as the cluster reliability. We firstly estimate the unreliability of fuzzy clusters applying similarity between fuzzy clusters in the entire ensemble based on an entropic criterion, then obtain a reliability driven cluster indicator (RDCI) as the quantity of certain knowledge of the ensemble about the fuzzy

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (59)

D. Huang et al.
Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis
Neurocomputing
(2015)
Z. Yu et al.
Hybrid clustering solution selection strategy
Pattern Recognit.
(2014)
C. Zhong et al.
A clustering ensemble: two-level-refined co-association matrix with path-based transformation
Pattern Recognit.
(2015)
R. Avogadri et al.
Fuzzy ensemble clustering based on random projections for DNA microarray data analysis
Artif. Intell. Med.
(2009)
U. Maulik et al.
Modified differential evolution based fuzzy clustering for pixel classification in remote sensing imagery
Pattern Recognit.
(2009)
S. Das et al.
Automatic image pixel clustering with an improved differential evolution
Appl. Soft Comput.
(2009)
J.C. Bezdek et al.
FCM: the fuzzy c-means clustering algorithm
Comput. Geosci.
(1984)
X. Sevillano et al.
Positional and confidence voting-based consensus functions for fuzzy cluster ensembles
Fuzzy Sets Syst.
(2012)
E. Bedalli et al.
A heterogeneous cluster ensemble model for improving the stability of fuzzy cluster analysis
Proc. Comput. Sci.
(2016)
P.J. Rousseeuw
Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
J. Comput. Appl. Math.
(1987)

B.B. Chaudhuri et al.

On correlation between two fuzzy sets

Fuzzy Sets Syst.

(2001)

S. Raha et al.

Similarity based approximate reasoning: fuzzy control

J. Appl. Log.

(2008)

T.M. Silva Filho et al.

Hybrid methods for fuzzy clustering based on fuzzy c-means and improved particle swarm optimization

Expert Syst. Appl.

(2015)

J.M. Kleinberg

An impossibility theorem for clustering

A. Strehl et al.

Cluster ensembles—a knowledge reuse framework for combining multiple partitions

J. Mach. Learn. Res.

(2002)

X.Z. Fern et al.

Solving cluster ensemble problems by bipartite graph partitioning

L.I. Kuncheva

Combining Pattern Classifiers: Methods and Algorithms

(2004)

A. Fred

Cluster ensemble methods: from single clusterings to combined solutions

S. Vega-Pons et al.

A survey of clustering ensemble algorithms

Int. J. Pattern Recognit. Artif. Intell.

(2011)

H. Liu et al.

Spectral ensemble clustering via weighted k-means: theoretical and practical evidence

IEEE Trans. Knowl. Data Eng.

(2017)

A. Topchy, A.K. Jain, W. Punch, Combining multiple weak clusterings, in: Third IEEE Int. Conf. Data Min., IEEE Comput....

F. Gullo et al.

Diversity-based weighting schemes for clustering ensembles

T. Li et al.

Weighted consensus clustering

V.B. Berikov

A probabilistic model of fuzzy clustering ensemble

Pattern Recognit. Image Anal.

(2018)

K. Punera et al.

Consensus-based ensembles of soft clusterings

Appl. Artif. Intell.

(2008)

I.S. Dhillon

A divisive information-theoretic feature clustering algorithm for text classification

J. Mach. Learn. Res.

(2003)

S. Kullback et al.

On information and sufficiency

Ann. Math. Stat.

(1951)

E. Dimitriadou et al.

A combination scheme for fuzzy clustering

Int. J. Pattern Recognit. Artif. Intell.

(2002)

P. Rathore et al.

Ensemble fuzzy clustering using cumulative aggregation on random projections

IEEE Trans. Fuzzy Syst.

(2018)

Cited by (42)

Semi-supervised fuzzy clustering algorithm based on prior membership degree matrix with expert preference
2024, Expert Systems with Applications
Existing pre-processing methods for the prior membership degree matrix suffer from the following issues: (1) The labeling constraints for prior membership degree matrix have an effect on the expert’s judgment on the prior membership degree, which easily causes the distortion problem of the prior membership degree labeling information; (2) There exists the problem of inconsistency between the filling information and the labeling information in the prior membership degree matrix to be filled in the missing values with zeros. To address these problems, we propose an unconstrained labeling idea for the prior membership degree matrix and the corresponding pre-processing method for the missing values by introducing the statistical characteristics of extreme value distribution and simultaneously apply it to the semi-supervised fuzzy clustering algorithm. More specifically, we focus on learning an expert preference value from the prior membership degree matrix and filling in the missing values with the expert preference value. Thus, we propose an unconstrained pre-processing method for the prior membership degree matrix by filling in missing values with an expert preference to keep the filling information consistent with the labeling information in the prior membership degree matrix as much as possible. In addition, we design a semi-supervised fuzzy clustering algorithm based on an unconstrained prior membership degree matrix with expert preference (SFCM-EP) by introducing the K-L divergence to improve the applicability, utility and running performance of semi-supervised fuzzy clustering algorithm. Our experimental results on the simulation dataset and the UCI datasets show the feasibility and effectiveness of the proposed pre-processing method of the prior membership degree matrix with encouraging results.
Multi-fuzzy clustering validity index ensemble: A Dempster-Shafer theory-based parallel and series fusion
2023, Egyptian Informatics Journal
Clustering validity evaluation is a key part in clustering process. To adapt the complex data structure, the traditional fuzzy clustering validity index (FCVI) is designed more complex. The weighted combined validity evaluation method (WCVEM) is simple in structure but difficult in weight selection. Therefore, this paper proposed an ensemble method based on multi-fuzzy clustering algorithms and multi-FCVI. Firstly, multi-FCVI are calculated by using the multiple sets of cluster centers and membership degrees that obtained by multi-fuzzy clustering algorithms. This can improve the robustness of the multi-FCVI. Secondly, multi-FCVI are ensembled by Dempster-Shafer (DS) theory. The validity index basic probability assignment function can be obtained by calculating the credibility of each validity index with different clusters number. Finally, the decision module is used to output the optimal clusters number. This paper ensembles multi-fuzzy clustering algorithms, multi-FCVI, and the DS theory by using series and parallel structure to verify performance of the proposed model and the degree of information retention of the FCVI. The proposed method is simple in structure and does not need to be select weighted. 6 artificial datasets and 12 UCI datasets were selected to simulate and verify the method. When facing different data, the simulation results show that the parallel structure has the highest accuracy, and the series structure is even worse than the weighted method in some datasets. In addition, the paper changes the value of fuzzy weighted, and experimental results show that the ensemble method has better stability than other methods in the face of different fuzzy weighted strategy.
A survey of underwater search for multi-target using Multi-AUV: Task allocation, path planning, and formation control
2023, Ocean Engineering
There are significant advantages using the autonomous underwater vehicle (AUV) for underwater search. Compared with a single AUV, multi-AUV offers greater efficiency and better stability in underwater search. At the same time, the theoretical and technical level of autonomous navigation and cooperative control of multi-AUV formation is the key to the implementation of the underwater search task. The following key factors are worth discussing in the application of multi-AUV in underwater search: task allocation, path planning, and formation control. The purpose of this paper is to grasp the application and development trend of multi-AUV formation in underwater search, so as to summarize the past, present, and future research and development trends of this investigation field in detail.
Geometric consistent fuzzy cluster ensemble with membership reconstruction for image segmentation
2023, Digital Signal Processing: A Review Journal
In recent years, to handle the ubiquitous uncertainty and unknown noise, different fuzzy clustering methods have been introduced to solve image segmentation problem for various applications, such as natural image, satellite image, multichannel (typically remote sensing) image. In fact, it is hard to know which one is the best for specified task. Cluster ensemble method is proposed to solve the problem of choosing a particular fuzzy clustering algorithm, or a special image processing before clustering, that best suits the given image segmentation task. In fuzzy cluster ensemble, membership vectors generated from different fuzzy clustering methods are merged into one vector as an input object, which is also a combination of data partitions. However, this kind of input object may lose detail information from original target image and may cause inaccurate edges in segmentation results. Moreover, by means of treating it as a new representation of original data, the membership vector of fuzzy cluster ensemble should be intuitively geometric consistent with the original target image. In this paper, by holding this view, we develop a geometric consistent fuzzy cluster ensemble model for spatial data, which involves a constraint between the membership and its reconstruction, to improve the clustering performance on image segmentation. In the proposed model, a pre-determined gradient-preserving weight is used in the membership reconstruction item to make the membership matrix be geometric consistent with the original target image. A semi-implicit optimization iterative algorithm is adopted to solve the proposed geometric consistent model. Experimental results demonstrate the effectiveness of proposed model in synthetic and real-world image segmentation problems over several state-of-the-art methods.
An ensemble hierarchical clustering algorithm based on merits at cluster and partition levels
2023, Pattern Recognition
Citation Excerpt :
The authors used multi-nominal logistic regression to discover the pattern of clustering results. Bagherinia et al. proposed a reliability-based weighted fuzzy clustering ensemble algorithm [16]. Here, the weight of each cluster is calculated based on its unreliability estimate with an entropic metric.
Ensemble clustering has emerged as a combination of several basic clustering algorithms to achieve high quality final clustering. However, this technique is challenging due to the complexities in primary clusters such as overlapping, vagueness, instability and uncertainty. Typically, ensemble clustering uses all the primary clusters into partitions for consensus, where the merits of a cluster or a partition can be considered to improve the quality of the consensus. In general, the robustness of a partition may be poorly measured, while having some high-quality clusters. Inspired by the evaluation of cluster and partition, this paper proposes an ensemble hierarchical clustering algorithm based on the cluster consensus selection approach. Here, the selection of a subset of primary clusters from partitions based on their merit level is emphasized. Merit level is defined using the development of Normalized Mutual Information measure. Clusters of basic clustering algorithms that satisfy the predefined threshold of this measure are selected to participate in the final consensus. In addition, the consensus of the selected primary clusters to create the final clusters is performed based on the clusters clustering technique. In this technique, the selected primary clusters are re-clustered to create hyper-clusters. Finally, the final clusters are formed by assigning instances to hyper-clusters with the highest similarity. Here, an innovative criterion based on merit and cluster size for defining similarity is presented. The performance of the proposed algorithm has been proven by extensive experiments on real-world datasets from the UCI repository compared to state-of-the-art algorithms such as CPDM, ENMI, IDEA, CFTLC and SSCEN.
A survey of fuzzy clustering validity evaluation methods
2022, Information Sciences
Citation Excerpt :
Different fuzzy clustering algorithms adapt to different data sets [125–127]. Therefore, the integration of fuzzy clustering algorithm [128–131] can be introduced, so the combination of multiple clustering algorithms and validity function can enhance the adaptability of validity function, but it does not essentially change the structure of the validity evaluation. Influence of Datasets Structure
As an unsupervised learning method, clustering does not need to know prior knowledge of the datasets in advance. How determining the optimal number of clusters becomes an important method to judge the quality of clustering results. For fuzzy clustering algorithms, the introduction to fuzzy partition makes it more consistent with the structure of real datasets than hard clustering algorithms. Therefore, it is necessary to carry out the research on the validity evaluation methods of fuzzy clustering. At present, the research on fuzzy clustering validity mainly focuses on the fuzzy clustering validity index (FCVI) and the combined fuzzy clustering validity evaluation method (CFCVE). From these two aspects, this paper reviews fuzzy clustering validity functions and combined fuzzy clustering validity evaluation methods. Then FCVI and CFCVE are discussed in details from different points on fuzzy clustering validity functions, and the research status and construction strategies of different fuzzy clustering validity evaluation methods are analyzed. The accuracy and stability of each fuzzy clustering validity evaluation method are analyzed through comparative experiments. Finally, the paper summarizes the shortcomings and advantages of the current research on fuzzy clustering validity and looks forward to the research direction and improved methods of the evaluation methods.

View all citing articles on Scopus

View full text

Reliability-based fuzzy clustering ensemble

Abstract

Introduction

Section snippets

Related work

Preliminaries

Proposed approach

Experiments

Conclusion and future work

Declaration of Competing Interest

Neurocomputing

Pattern Recognit.

Pattern Recognit.

Artif. Intell. Med.

Pattern Recognit.

Appl. Soft Comput.

Comput. Geosci.

Fuzzy Sets Syst.

Proc. Comput. Sci.

J. Comput. Appl. Math.

Fuzzy Sets Syst.

J. Appl. Log.

Expert Syst. Appl.

An impossibility theorem for clustering

Cluster ensembles—a knowledge reuse framework for combining multiple partitions

J. Mach. Learn. Res.

Solving cluster ensemble problems by bipartite graph partitioning

Combining Pattern Classifiers: Methods and Algorithms

Cluster ensemble methods: from single clusterings to combined solutions

A survey of clustering ensemble algorithms

Int. J. Pattern Recognit. Artif. Intell.

Spectral ensemble clustering via weighted k-means: theoretical and practical evidence

IEEE Trans. Knowl. Data Eng.

Diversity-based weighting schemes for clustering ensembles

Weighted consensus clustering

A probabilistic model of fuzzy clustering ensemble

Pattern Recognit. Image Anal.

Consensus-based ensembles of soft clusterings

Appl. Artif. Intell.

A divisive information-theoretic feature clustering algorithm for text classification

J. Mach. Learn. Res.

On information and sufficiency

Ann. Math. Stat.

A combination scheme for fuzzy clustering

Int. J. Pattern Recognit. Artif. Intell.

Ensemble fuzzy clustering using cumulative aggregation on random projections

IEEE Trans. Fuzzy Syst.