Abstract

Fuzzy c-means (FCM) is one of the best-known clustering methods to organize the wide variety of datasets automatically and acquire accurate classification, but it has a tendency to fall into local minima. For overcoming these weaknesses, some methods that hybridize PSO and FCM for clustering have been proposed in the literature, and it is demonstrated that these hybrid methods have an improved accuracy over traditional partition clustering approaches, whereas PSO-based clustering methods have poor execution time in comparison to partitional clustering techniques, and the current PSO algorithms require tuning a range of parameters before they are able to find good solutions. Therefore, this paper introduces a hybrid method for fuzzy clustering, named FCM-ELPSO, which aim to deal with these shortcomings. It combines FCM with an improved version of PSO, called ELPSO, which adopts a new enhanced logarithmic inertia weight strategy to provide better balance between exploration and exploitation. This new hybrid method uses PBM(F) index and the objective function value as cluster validity indexes to evaluate the clustering effect. To verify the effectiveness of the algorithm, two types of experiments are performed, including PSO clustering and hybrid clustering. Experiments show that the proposed approach significantly improves convergence speed and the clustering effect.

1. Introduction

In order to obtain effective information on huge quantities of data quickly and accurately, many methods have been proposed. As an unsupervised learning method, clustering analysis is one of the vital means in dealing with these data whose objective is to partition an unlabeled dataset into a number of clusters, such that elements in same cluster show a high level of similarity, while elements from different clusters show a high level of dissimilarity. The clustering technique has been studied extensively in a variety of application fields such as data mining, machine learning, pattern recognition, and image segmentation [13].

Clustering algorithms can be further divided into two basic categories: hard and fuzzy [4]. Hard clustering methods assign each object to a single group, while fuzzy clustering methods introduce membership degrees between objects and the different clusters of the dataset and assign each element of a dataset to multiple clusters simultaneously in accordance with the membership function matrix. Therefore, the latter can handle overlapping partitions.

The most popular fuzzy clustering algorithm is fuzzy c-means (FCM) which was proposed by Bezdek et al. [5] and has been widely used in multiple domains [6, 7]. The goal of FCM is to minimize the criterion function and obtain a more accurate membership matrix gradually. But the random selection of center points makes iterative process fall into the saddle points or local optimal solution easily. Furthermore, if the datasets contain severe noise points or if the datasets are high dimensional, such as bioinformatics [8], the alternating optimization often fails to find the global optimum.

However, these shortcomings have motivated the proposal of alternative approaches for fuzzy clustering, many of which are extensions of FCM. A kernel-based FCM (KFCM) was proposed by Zhang and Chen [9], which replaces the Euclidean distance metric with a kernel metric to achieve better mapping for nonlinear separable datasets. Lin [10] proposed a novel evolutionary kernel intuitionistic FCM clustering algorithm (EKIFCM) that combines intuitionistic fuzzy sets (IFSs) with KFCM and utilizes genetic algorithms (GA) to optimize the parameters of EKIFCM simultaneously. Although these FCM versions aim to achieve good performance in fuzzy clustering, they do not improve the random initialization process of FCM and still fall into local optimum easily [11].

The probability of finding the global optimum may be increased by stochastic methods such as evolutionary or metaheuristic optimization algorithms. As one of the most famous metaheuristic methods, PSO has become one of the most popular metaheuristics and an important tool for many applications due to its versatility and simplicity, and it is found that it can provide better initial centroids for the FCM algorithm to improve the FCM results, and thus this has motivated the proposal of many PSO-based methods for hard clustering [12] and some PSO-based methods for fuzzy clustering [11, 13, 14]. Cura [15] presented a new PSO approach to the clustering problem, employing the pure PSO technique to solve both clustering problems with both known and unknown numbers of clusters, which provides a new idea for clustering.

Izakian and Abraham [16] proposed a hybrid fuzzy clustering method based on FCM and fuzzy PSO (FPSO), and their experiments show better results than FPSO and FCM. The quantum-behaved particle swarm optimization (QPSO) with a fully connected topology is coupled with the FCM, forming a new version of hybrid method called QPSO-FCM [17]. However, these PSO-based methods are much slower compared to the traditional methods which may limit their practical applications.

Another problem with PSO-based clustering methods, according to Alam [12], is the need to tune a range of parameters before they are able to find a better solution. For overcoming these shortcomings, hybrid methods for fuzzy clustering based on fuzzy c-means and improved particle swarm optimization (FCM-IDPSO) were proposed by Silva Filho et al. [18], who introduced the IDPSO for adjusting the parameters dynamically during training and tackling the two main problems of PSO-based clustering methods. Many improved PSO-FCM clustering methods have been successfully applied to practical applications [1922]. It is worth noting that the complex structure of PSO-based methods and the huge amount of computation make the algorithm have room for further improvement.

In recent years, many excellent hybrid methods have been proposed for optimal cluster analysis, which do not use PSO as optimization algorithm, such as CRO-FCM [23] which uses chemical-based metaheuristic obtaining optimal cluster centers for FCM; ETLBO-FCM [24] incorporates elicit teaching learning-based optimization and FCM to overcome the major limitations of FCM; Rahul et al. [25] introduced bat optimization to FCM and utilized maxi-min classifier to determine the count of clusters, and the results showed that the clustering accuracy is improved. These studies have greatly promoted the development of clustering algorithms.

One of the main contributions of this paper is to introduce a new version PSO with enhanced logarithmic decreasing strategy (ELPSO) for clustering. Based on this strategy, ELPSO takes different inertia weight values during various periods adaptively and thus provides better balance between exploration and exploitation and avoids falling into local minima quickly, thereby obtaining better solutions. The other contribution of this paper is to propose a new method for the fuzzy clustering problem using hybridization combining FCM with ELPSO, named FCM-ELPSO, which makes use of the merits of both algorithms. This hybrid method introduces ELPSO for training process and uses ELPSO’s global exploration to find a suitable initial clustering prototype for FCM and the local exploration to avoid falling into local optimum and utilizes the fast convergence of FCM to improve the results and convergence time. Both clustering methods are tested based on UCI datasets independently, and the results are compared to other PSO-based clustering methods, respectively.

The structure of the paper is as follows. Section 2 outlines all necessary prerequisites. In Section 3, a new version of PSO for clustering, named ELPSO, and the hybrid method (ELPSO-FCM) are proposed. Section 4 includes the results of experiments based on UCI datasets. In Section 5, main conclusions are covered.

2. Theoretical Basis

In this section, we briefly describe some basic concepts of FCM, original PSO (or standard PSO, SPSO) and some improved versions of PSO with different inertia weight strategies, and a cluster index which is used in the hybrid method for evaluating the clustering effect.

2.1. FCM

We define as a clustering dataset of objects indexed by ; each object is represented by a vector of quantitative variable. We define as the prototypes of clusters listed by i and as a fuzzy partition matrix, where indicates the membership of the jth object with the prototype. , , where Q is the data dimensionality. The constraints on are as follows:

The goal of the FCM algorithm is to find the optimal prototype matrix and the corresponding membership degree matrix that minimize an objective function given by the following equation:where m is the fuzzy weighting exponent and is the Euclidean distance that indicates the dissimilarity from data vectors to cluster center .

The parameter is obtained by the following equation:

To minimize the criterion , the clustering prototypes and the membership degrees are updated according to equations (6) and (7), respectively.

After computing the memberships of all the objects, the new prototypes of the clusters are calculated. The process stops when the prototypes stabilize. That is, the prototypes from the previous iteration are of close proximity to those generated in the current iteration, normally less than an error threshold.

2.2. Original Particle Swarm Optimization

PSO is originally introduced in terms of social and cognitive behavior of bird flocking and fish schooling. The potential solutions are called particles which fly through the problem space by following the current best particles. Each particle keeps track of its coordinates in the problem space which are associated with the best solution that has been achieved so far. The solution is evaluated by the fitness value, which is also stored. This value is called . Another best value that is tracked by the PSO is the best value, obtained so far by any particle in the swarm. The best value is a global best and is called . The search for the better positions follows the rule as equations (8) and (9):where and are position and velocity vector of the particle , respectively; is the inertia weight; and are positive constants, called acceleration coefficients which control the influence of and in search process; and and are random values in the range [0, 1]. The fitness value of each particle’s position is determined by a fitness function, and PSO is usually executed with repeated application of (8) and (9) until a specified number of iterations have been exceeded or the velocity updates are close to zero over a number of iterations.

2.3. Some Improved Versions of PSO with Different Inertia Weight Strategies

Using statistical theory to analyze the variance of the basic parameters of PSO, including inertia weight and accelerating constants, it can be considered that the inertia weight has tremendous impact on the overall performance of PSO [26]. Many studies have shown that larger inertia weight values have better global search capabilities, and smaller inertia weight values have advantages in local exploitation [27]. So, different adaptive strategies of inertia weight are proposed to achieve a better balance between exploration capabilities and development capabilities and get more stable and satisfactory results, such as linear, nonlinear, fuzzy rules, random, and other strategy-based inertia weights.

In this section, three kinds of inertia weight strategies will be emphatically reviewed which are widely used in a variety of application domains, and the process of corresponding algorithms can be found in [2830]. The method proposed in this paper will be compared with the above algorithm in Section 4.

2.3.1. Linear Inertia Weight Strategy

The monotonic decreasing inertia weight adjustment strategy is introduced into PSO by Eberhart [28] and aimed to enhancing the fine-tuning ability of PSO. But linear inertia weight strategy cannot achieve the accurate balance between local search and global search due to the nonlinearity and complexity of the PSO search process. So, it does not always perform better than an appropriate fixed inertia weight when the inertia weight decreases gradually as the iteration proceeds.

2.3.2. Natural Exponential Inertia Weight Strategy

Inspired by linear decreasing inertia weight strategy, Chen et al. [29] proposed two inertia weight strategies of natural exponential functions. Based on their experimental settings, these natural exponential strategies have a faster convergence speed in the early stage of PSO search process compared with the linear adjustment strategy.

2.3.3. Random Inertia Weight Strategy

It is difficult to predict whether in a given time the exploration or exploitation would be better in the dynamic environment. So, randomness is introduced into the inertia weight strategy of PSO to address this problem in [30]. Using particle swarms to track and optimize dynamic systems, a new way of calculating the inertia weight value is proposed.

2.4. Cluster Index PBM(F)

Pakhira et al. [31] proposed a validity index called the PBM index. The index is developed for both crisp and fuzzy clustering; however, here we review only the fuzzy version of the index called the PBM(F) index. The index is defined aswhere ; ; C is the number of clusters; and is the center of dataset S.

is different from and considered to bewhere N is the total number of patterns in the dataset, is a partition matrix for the data, and is the centroid of the ith cluster; here, the fuzzy parameter is set to 1.5.

The factor, , includes the sum of weighted intracluster distances for the complete dataset taken as a single cluster and that for the c cluster system. This factor is a measure of the compactness of a C cluster system. The factor is the maximum intercluster separation in a C cluster system. This factor signifies between cluster separation. Higher values of the PBM(F) index indicate better clustering in the sense that the clusters are well separated and relatively compact.

3. Proposed Algorithms

In this section, we will introduce the new version PSO with enhanced logarithmic decreasing strategy in detail, named ELPSO, and give the algorithmic process for clustering application; next, based on ELPSO and FCM, a hybrid algorithm is formed for combining the merits of these two algorithms, called FCM-ELPSO.

3.1. Enhanced Logarithmical PSO (ELPSO)

In order to adjust the performance of particle swarm and balance the global search and local search capabilities of the swarm in flight process, a simple and effective inertia weight adjustment strategy was introduced into PSO and a new version of PSO, called enhanced logarithmic decreasing PSO (ELPSO), was developed. The new strategy function is formulated as follows:where t is the current iteration and z is the regulatory factor for fine-tuning ability of PSO, whose value can be set to 1.05 by experience. Equations (13) and (14) show the new velocity formula and position formula of particle l at instant t using the new inertia weights:

The size of each element is consistent in equations (13) and (14) except the parameters and . In order to increase the randomness of particle swarm search, we set the random value R as a matrix. Random matrixes of each particle will be initialized during every iteration, and the range of each element in the matrix is [0, 1].

Here, we give the method for clustering which employs the pure ELPSO technique.

Let the position of particle l, represented by , be the prototype matrix, whose size is , where C is the right cluster number and Q is the dimension of the datasets in which is the size of population. In this way, may be expressed as follows:

Therefore, a swarm represents a number of candidates’ cluster center for the data vector. Each data vector belongs to a cluster according to its membership function and thus a fuzzy membership is assigned to each data vector. Each cluster has a cluster center per iteration and presents a solution which gives a vector of cluster centers. This method determines the position vector for every particle, updates it, and then changes the position of cluster center. And the fitness function for evaluating the generalized solutions is stated aswhere is the objective function of FCM, as shown in equation (4), calculated for particle l. The smaller is, the better is the clustering effect and the higher is the fitness function .

Notations:
P: the population of ELPSO; ω_initial: the initial inertia weight of ELPSO; : the inertia weight of the particle l; and : acceleration coefficients; : the position of the particle l; : velocity vector of the particle l; : the best position that particle l has achieved at instant t; : the best position achieved by the swarm at instant t; : the membership degree matrix of the particle l; : the fitness value of the particle l; T: the maximum number of iterations;
Input: dataset S and number of clusters C;
Output: the best position .
Process:
(1)Create a swarm with P particles;
(2)Initialize parameters for ELPSO including size of population P; ω_initial for each particle (l= 1, 2, 3, …, P); and ;
(3)Initialize , , and for each particle (l= 1, 2, 3, …, P) and for the swarm;
(4)Repeat {
(5) Calculate the membership degree matrix of each particle;
(6) Calculate the criterion of each particle;
(7) Calculate the of each particle;
(8) Calculate the of the swarm;
(9) Update the velocity of each particle using equation (13);
(10) Update the position of each particle using equation (14);
(11) For each particle (l= 1, 2, 3, …, P) update using equation (12);
(12) store as ;
(13)t = t + 1;
   }
Until ELPSO termination condition is met (∗).
return matrix.
(∗) The termination condition of PSO in this method is t ≥ T (reach the maximum number of iterations) or the velocity updates are close to zero over a number of iterations.

The fake code is shown as follows.

3.2. The Hybrid Methods for Fuzzy Clustering Based on Fuzzy c-Means and Improved Particle Swarm Optimization

Although FCM requires fewer function evaluations, it usually falls into local optima. In this section, the FCM algorithm is integrated with ELPSO algorithm to form a hybrid clustering algorithm called FCM-ELPSO which maintains the merits of both FCM and ELPSO algorithms. This hybrid method introduces ELPSO for training process and uses ELPSO’s global exploration to find a suitable initial clustering prototype for FCM and the local exploration to avoid falling into local optimum and utilizes the fast convergence of FCM to improve the results and convergence time.

Notations:
P: the population of ELPSO; ω_initial: the initial inertia weight of ELPSO; : the inertia weight of the particle l; and : acceleration coefficients; : the position of the particle l; : velocity vector of the particle l; : the best position that particle l has achieved at instant t; : the best position achieved by the swarm at instant t; : the membership degree matrix of the particle l; : the fitness value of the particle l; T_PSO: the maximum number of iterations in PSO part; T_FCM: the maximum number of iterations in FCM part; m: the level of cluster fuzziness;
Input: dataset S and number of clusters C;
Output: the best position .
Process:
(1)Create a swarm with P particles;
(2)Initialize parameters for ELPSO including size of population P; ω_initial for each particle (l= 1, 2, 3, …, P); and , m;
(3)Initialize , , and for each particle (l= 1, 2, 3, …, P) and for the swarm;
(4)do {
  ELPSO:
  Repeat {
(5) Calculate the membership degree matrix of each particle;
(6) Calculate the criterion of each particle;
(7) Calculate the of each particle;
(8) Calculate the of the swarm;
(9) Update the velocity of each particle using equation (13);
(10) Update the position of each particle using equation (14);
(11) For each particle (l= 1, 2, 3, …, P) update using equation (12);
(12) store as ;
(13)t = t + 1;
   }
Until ELPSO termination condition is met (∗).
 FCM:
 Repeat {
(14) Calculate membership degrees using equation (7);
(15) Calculate cluster prototypes using equation (6);
(16) Calculate the of each particle;
(17) Calculate the of the swarm;
(18) store as ;
  }
Until ELPSO termination condition is met (∗∗)
While termination condition is not met (∗∗∗)
return matrix.
(∗) when it reaches 95 iterations (T_PSO) or there is a variation less than or equal to 0.00001 on criterion J.
(∗∗) when it reaches 5 iterations (T_FCM) or there is a variation less than or equal to 0.00001 on criterion J.
(∗∗∗) when it reaches 500 total iterations (ELPSO + FCM) or when there are no changes to the in two consecutive runs of the FCM–PSO (FPSO followed by FCM).

The fake code is shown as follows.

4. Experiments and Results

This section is divided into two parts: ELPSO clustering and hybrid clustering, can use Algorithm 1 and Algorithm 2 for obtaining corresponding results separately. All experiments are based on the platform Matlab 2016b and executed on an Intel core i7-8750H 2.20 GHz computer running Microsoft Windows 10.

For evaluating the performance of the proposed algorithms, nine well-known UCI Machine Learning Repository datasets have been selected: Abalone, Ecoli, Glass, Image segmentation, Page blocks classification, Spectf, Steel plates faults, Ultrasonic flowmeter diagnostics, and Yeast. These datasets include examples of low, medium, and high dimensional data with various partitions. A detailed description of the datasets is shown in Table 1.

4.1. ELPSO Clustering

The ELPSO, original PSO, and three improved versions with different inertia weight strategies shown in Section 2.3 will be tested here for evaluating the performance of these heuristic algorithms. Based on Abalone, Ecoli, Glass, and Image segmentation datasets, each method runs 30 times independently and total 500 iterations within every time.

According to the methodology used by Izakian and Abraham [16], criterion J is introduced to evaluate the clustering effect. The lower values of J, the better clustering effect could be obtained. Therefore, the experimental data with the minimum final value of criterion J were considered as the optimal result. The average value recorded was to account for the stochastic nature of the algorithms. For a better view of the results, the best values and the average values of J are shown in Figures 14, respectively.

Since the inertia weight plays an important role on the overall performance of the algorithm, in order to ensure that the variables are unique, all parameters are set consistently except the inertia weight. The parameter values for each algorithm are set as follows.

The population: all algorithms are set to 30 uniformly; ELPSO: c1 = c2 = 2, ω is dynamically adjusted according to the proposed strategy using equation (12), and z is set to 1.05; the parameters in other algorithms are set to be consistent with ELPSO and their inertial weight strategy reference literature [2830].

The results are shown as follows.

For a better observation of experimental results, we extract the curves of the first 200 iterations separately and place them in the overall iteration graph. In this way, we can perceive the convergence trend of each algorithm explicitly. In addition, the criterion J of the 50th, 200th, and 500th iterations is listed in Tables 2 and 3, respectively; these results represent the optimal value and average value in experiments.

Figures 14 show the result for the five approaches represented by five colored curves. In each figure, the horizontal axis represents the number of iterations, and the vertical axis represents criterion J. A smaller value of J indicates better results.

The optimal result in 30 iterations represents the extreme ability of algorithm, but the average result over 30 iterations can better illustrate the performance of algorithm. It is clearly seen from Figures 1 to 4 that ELPSO converges more quickly and has obvious advantage of convergence speed in best graphs and average graphs than other algorithms, especially in the first fifty iterations.

Tables 2 and 3 show that ELPSO always achieves the smallest value for criterion J in 50th, 200th, and 500th iterations, better than other four algorithms regardless of the best values or average values. Although in the Abalone dataset, EPSO finally obtained the same optimal value as ELPSO, but its earlier convergence rate was slower than ELSPO. From the results obtained from the four sets of datasets, LPSO is more likely to fall into a local optimum in five algorithms, and ELPSO has never been trapped in a local optimum due to its appropriate inertia weight selection strategy.

The results of the tests lead to the conclusion that the proposed ELPSO is efficient and has rapid convergence, can counterpoise the global search and local search more effectively, and can reveal very encouraging results in terms of quality of solution found.

4.2. Hybrid Methods Clustering

In this section, the FCM-ELPSO proposed in this work is compared to other four PSO-based hybrid algorithms which are FCM-SPSO, FCM-LPSO, FCM-EPSO, and FCM-RPSO. In addition, GA-FCM is added to the test. For evaluating the performance of all of the above algorithms, eight UCI datasets are selected: Ecoli, Glass, Image segmentation, Page blocks classification, Spectf, Steel plates faults, Ultrasonic flowmeter diagnostics, and Yeast, as shown in Table 1.

To quantitatively evaluate the convergence effect, the fundamental criterion can be described as follows: the distance between different objects in the same cluster should be as close as possible; the distance between different objects in different cluster should be as far as possible. The criterion J is still introduced to evaluate the clustering effect, as the same in Section 4.1. Additionally, an effective cluster validity index is introduced into the evaluation system, namely, PBM(F), which has been described in detail. It is worth reminding again that for a given dataset and the determined number of clusters, higher values of the PBM(F) index indicate better clustering in the sense that the clusters are well separated and relatively compact.

Each algorithm is run 30 times with random initializations for every dataset, and the partition that corresponds to the best criterion value is selected. Once the partition is selected, its corresponding PBM(F) is calculated. Furthermore, the average and standard deviation of the 30 repetitions are also computed for criterion J and validity index PBM(F). The parameters of the PSO part in these five algorithms are the same values as in Section 4.1, and the fuzzy parameter m in the FCM part is set to 2. The results are shown as follows.

Table 4 shows the best objective function values expressed in equation (4) obtained from the five clustering algorithms. For a more careful observation, the average values are provided separately in Table 5. It should be noted that the hybrid methods always converge before reaching the aforementioned maximum number of iterations [16]. Hence, it can be considered that under the same stopping condition, the performance of the algorithms depend on their results.

Tables 4 and 5 show that FCM-ELPSO always achieves the smallest value for criterion J. For further illustrating the performance of these algorithms, we introduce the standard deviation to describe the deviation degree of the mean values. The smaller the standard deviation value is, the smaller the convergence range is and the more robust the algorithm is. Table 4 shows the standard deviation for criterion J.

In Table 6, FCM-ELPSO gets the smallest standard deviation on five datasets, Glass, Page blocks classification, Spectf, Ultrasonic flowmeter diagnostics, and Yeast. FCM-SPSO gets two, Image segmentation and Steel plates faults, and FCM-LPSO has one, Ecoli. It can be seen that ELPSO has a smaller convergence range and higher robustness.

Tables 79 show the corresponding values for validity index PBM(F).

FCM-ELPSO gets the maximum on five datasets of best results for validity index PBM(F), as shown in Table 7, and FCM-RPSO performs better in Glass and Page blocks classification, while FCM-LPSO is good at Spectf. In terms of average results and standard deviation, FCM-ELPSO performs better than other algorithms. And it is noticed that the performance of GA-FCM is not as good as the hybrid clustering algorithms based on PSO.

Comparing the results of two cluster validity indexes, it is possible to notice that the best criterion J is not always associated with the best value for the PBM(F) because the cluster validity indexes are not applicable to all datasets. However, the experimental results can still prove that FCM-ELPSO performs better and has better robustness. The hybrid algorithm combines the merits of both algorithms to prevent premature convergence and trapping into local optimum effectively and improves the convergence speed slightly and obtains satisfactory results.

5. Conclusion

This paper proposes ELPSO to better balance between exploration and exploitation, which avoids falling into local optimum and has excellent convergence ability. In order to overcome the shortcomings of the PSO-based fuzzy clustering algorithms, ELPSO and FCM are combined to form a hybrid method called FCM-ELPSO, which utilizes the global search property of ELPSO to produce suitable initial clustering prototypes for FCM. FCM-ELPSO can correct the clustering direction during training constantly. So, as a randomized initialization approach, the hybrid method has the capability to alleviate the problems faced by FCM, which has some demerits of initialization and falling in local minima. The experiments test the ELPSO and the hybrid algorithm separately. Experimental results show that ELPSO and FCM-ELPSO perform well in the UCI datasets. In particular, FCM-ELPSO can produce higher quality clusters with a smaller standard deviation on the selected datasets compared with other clustering methods, especially in the high dimension and large data cases.

For future work, we will explore the practical application of the proposed methods in different fields, such as image segmentation, text mining, and medical problems. Furthermore, we will research novel initialization methods of PSO to improve the performance for complex datasets.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Key R&D Program of China (2018YFB1308400).