Abstract

A multiscale cooperative differential evolution algorithm is proposed to solve the problems of narrow search range at the early stage and slow convergence at the later stage in the performance of the traditional differential evolution algorithms. Firstly, the population structure of multipopulation mechanism is adopted so that each subpopulation is combined with a corresponding mutation strategy to ensure the individual diversity during evolution. Then, the covariance learning among populations is developed to establish a suitable rotating coordinate system for cross operation. Meanwhile, an adaptive parameter adjustment strategy is introduced to balance the population survey and convergence. Finally, the proposed algorithm is tested on the CEC 2005 benchmark function and compared with other state-of-the-art evolutionary algorithms. The experiment results showed that the proposed algorithm has better performance in solving global optimization problems than other compared algorithms.

1. Introduction

The differential evolution (DE) is a bionic intelligence method proposed by American scholars Rainer Storn and Kenneth Price in 1995, simulating survival of the fittest [1, 2]. The algorithm adopts mutation, crossover, and selection operations to mimic genetic mutations during biological evolution and retains highly adaptable individuals for optimal solutions. Aiming at the problems of both population convergence stagnation and premature convergence, researchers mainly focus on three aspects of control parameter setting and mutation strategy selection [36], crossover operation [79], and population structure [1012] to improve the algorithm performance. The DE has been widely concerned by researchers because of its simple coding, convergence, and strong robustness. It has been applied in many fields such as industrial control [13], antenna design [14], power system [15], image processing [16], and so on.

The parameter control and evolutionary strategy selection are mainly discussed in two aspects of DE. On the one hand, control parameter settings for the scaling factor F, crossover probability CR, and population size NP [17]. On the other hand, different strategies for different optimization problems [18], we need to choose the most suitable strategy. The parameter setting affects the population diversity [19], the development ability of the early period, and the convergence of the later period [20]. The choice of evolutionary strategy is the key step to determine the balance between exploration and convergence of DE, and different evolution strategies will show different surveying capabilities and the convergence tendencies. At the same time, diverse crossover operations have diverse effects on seeking global optimization. Although the traditional binomial crossover operation has a certain role, it is more dependent on the cross coordinate system and is widely used. In addition, the population structure is also an important indicator for the algorithm performance. If the population size is too small, it will easily lead to the loss of effective alleles, thereby reducing the generation of competitive individuals. In contrast, if the population size is too large, the possibility of correct search direction by the algorithm will be reduced [11].

Due to premature convergence, parameter control, and strategy improvement, crossover operation and population structure attract increasing attention to improve the performances of DEs [17, 2124]. Therefore, numerous DE-improvement algorithms [25, 26] have been proposed constantly based on parameters and strategies, such as parameter-adaptive jDE [21], JADE using “current-to-pbest/1” strategy and adaptive parameters [24], SaDE using adaptive difference strategy [23], CoDE suiting for experimental individual algebra strategy and control parameters [27], EPSDE for mutation strategy and control parameter [17], TDE for triangular mutation strategy [28], and super-fit multistandard adaptive SMADE [29]. Moreover, the DE is improved based on the crossover operation, such as ODE [30] adopting orthogonal crossover operators and CoBiDE [31] using covariance learning and bimodal distribution of parameters. And DE is improved based on population structure, such as SPSRDEMMS [32] for multivariation strategies, MPEDE [33] for multipopulations and strategy sets, master-slave model [34], island model [35], cellular model [36], level model [37], and pool model [38]. In recent years, population segregation techniques have been used to improve evolutionary algorithms, including particle swarm optimization, genetic algorithm, and evolutionary algorithm [3944].

To further improve the convergence and reduce the population stagnation, a multiscale cooperative differential evolution (MCDE) algorithm is proposed. In terms of parameter setting, the scaling factor F and the crossover probability CR are mainly adjusted based on the literature [24]. In the selection of mutation strategy, the MCDE selects “current-to-pbest/1,” “current-to-rand/1,” and “rand/1” as mutation strategy groups. In the initial phase, the evolutionary population was divided into multiple subpopulations, and one subpopulation was selected as the experimental population to test the mutation strategy with better evolutionary results. In the evolutionary phase, the global search capability is continuously promoted by establishing a constant rotation of the cross coordinate system and coordinating among multiple subpopulations. In the end, the best individual that remains is used as the optimal solution. In CEC 2005, 30-dimension and 50-dimension simulation tests were conducted and compared with contemporary evolutionary algorithms; MCDE was found to have more significant effects.

The paper is organized as follows. Section 2 briefly introduces the standard DE algorithm. Section 3 elaborates on the algorithm improvement. Section 4 analyzes the significance of the proposed algorithm through experimental data. Section 5 gives a summary.

2. Standard Differential Evolution Algorithm

DE can be regarded as greedy evolution algorithm based on real number coding and global optimization. In the evolutionary phase, three iteration processes of mutation, crossover, and selection are performed until the stop condition is satisfied. The fitness function f(x) is utilized to evaluate the quality and the best individual is recorded.

2.1. Initialization

Assuming that the population size is NP and the dimension of the feasible solution space is D, xG is employed to represent the evolution population of G generation. Each individual is composed of D-dimensional parameters, which can be expressed aswhere and xL and xH represent the upper and lower bounds of the individual, respectively.

2.2. Mutation Operation

The individual in the parent population generates a variant individual by a mutation strategy. “DE/rand/1” indicates that the DE chooses a random perturbation individual to mutate. The expression is as follows:where r1 ≠ r2 ≠ r3 and r1, r2, and r3 are a randomly generated mutation individual. The scaling factor F is chosen from [0, 1].

2.3. Crossover Operation

The main function of the crossover operation is that the generated variant individuals cross with individuals in the original population to generate new crossover individuals. The DE adopts binomial crossover scheme. The crossover operation is as follows:where randj [0, 1], jrand is chosen from {1, 2, …, D}, and crossover probability CR is [0, 1].

2.4. Selection Operation

The selection operation mainly adopts the greedy selection mode of the survival of the fittest, making the offspring always superior to or equal to the parent individual xi. When the fitness value of the new individual ui is better than that of the objective individual, the new individuals ui will be accepted by the population. Otherwise, xi still remains in the next generation population and continues to perform mutation and crossover operations as the objective individual in the next iterative calculation so that the population will always evolve toward the optimal solution. The selection operation is for minimization fitness value as follows:where f(x) is the objective function to be optimized.

3. Multiscale Cooperative Differential Evolution Algorithm

In the proposed algorithm, we divide the whole population into multiple subpopulations and give corresponding mutation strategies. Then, the cross coordinate system of each subpopulation is established by covariance learning and parameter adaptation of evolutionary subpopulation. Finally, the obtained crossover individual is selected and the individuals with better fitness are retained to make the whole population search forward to the global optimal solution.

3.1. Multiscale Mutation Strategy Integration Method

In recent years, because different mutation strategies are suitable for solving different optimization functions, some researchers mainly focus on multiple mutation strategies method [23, 24]. Even for a specific optimization problem, the most appropriate mutation strategy may be different at different stages of evolution. Therefore, mutation strategy is an important indicator to ensure significant results in the DE. During evolution, this paper selects “current-to-pbest/1,” “current-to-rand/1,” and “rand/1” as the multiscale mutation strategy set because of the different performance requirements for the mutation strategy. The individuals of “current-to-rand/1” and “rand/1” involved in mutation are all selected in a random manner so that global optimization can be performed in the early stages of evolution. “current-to-pbest/1” seeks the global optimal solution through the current best population individual. During evolution, the search range can be reduced to the vicinity of the optimal solution and the convergence speed can be accelerated.

Current-to-pbest/1:

Current-to-rand/1:

rand/1:where is uniformly chosen as one of the top individuals in the current population with pbest (0, 1]. Because the three mutation strategies have their own advantages, there are some differences. Therefore, the population multiscale mechanism is introduced in this paper. The whole population Pop is divided into three subpopulation Pop1, Pop2, and Pop3. Pop1 with the largest population size is determined as the experimental population and combined with the corresponding mutation strategy. During evolution, the experimental population is allocated to mutation strategy with better evolution results. The population structure is expressed as follows:where we assume that Pop1 is an experimental population, NP is the population size, represents population size ratio, and [0, 1].

After the population structure is well designed, the distribution rules of the subpopulations should be given. First, subpopulation Pop1, Pop2, and Pop3 incorporate corresponding mutation strategies. Then, the population undergoes mutation, crossover, and selection operations. Finally, the total number bdi of superior individuals retained after each subpopulation evolution is counted. That is, the superior rate bri of the subpopulation can be expressed as

The superior rate bri of each generation of subpopulation is calculated, and the subpopulation is reallocated for three mutation strategies according to the superior rate bri in the next generation initialization stage. The multiscale mutation strategy set method makes full use of the advantages of the three mutation strategies to regulate and balance the contradiction between the population diversity and the convergence speed, which can be seen from the experimental results of the latter. In the first generation, the subpopulations randomly assign a mutation strategy. At the end of the first generation, we calculate the subpopulations superior rate by equation (9). The maximum superior rate stands for the best mutation strategy in this generation. Assume that the first-generation mutation strategy “current-to-pbest/1” has the highest superior rate, and the second generation assigns Pop1 to it. The remaining subpopulations Pop2 and Pop3 randomly assign a mutation strategy (“current-to-rand/1” or “rand/1”).

3.2. Covariance Learning

The aforementioned crossover operators mainly depend on the coordinate system, while the distribution information of the population reflects the direction of evolution to some extent [20]. During evolution, the distribution of population is often neglected, leading to the possibility of the population falling into local optimum and premature convergence. In this paper, variance and covariance are utilized to analyze population distribution and form covariance matrix to reflect population diversity information. Therefore, the systematic use of covariance matrix can reduce the dependence on coordinate system and the interaction between variables. Covariance matrix learning includes two related technologies: the feature decomposition and coordinate transformation of covariance matrix. The covariance matrix learning steps are as follows:Step 1. Calculate covariance matrix C of subpopulations.Step 2. Get the eigenvalue λ and feature vector matrix R of covariance.Step 3. Update the objective individual and the variant individual through the feature-based cooperative system.Step 4. Populations with better fitness for crossover and selection operations are retained and rotated back to the original coordinate system.

Based on the above four steps, we establish the population feature coordinate system. Figure 1(a) shows the initial coordinate system of population evolution, and Figure 1(b) shows the feature coordinate system. By analyzing the population feature, we obtain the ox1x2 coordinate system and discover that we can find the global optimum faster.

3.3. Adaptive Control Parameter Settings

At present, researchers have proposed many effective parameter adaptation methods [21, 23, 24]. The combination of different control parameters and mutation strategies for the optimization problem will yield different results. In this paper, each scale strategy has its own control parameters, and different technologies are applied to the algorithm. The method in [24] is more suitable for the algorithm, so it adapts to the algorithm by improving its technology.

During evolution, scaling factor F plays a decisive role in the search range of base vectors. In standard DE algorithm, the value of F is a fixed value, which cannot be applied to solve all global optimization functions. In this paper, the scalar factor F mainly adopts the Cauchy inverse cumulative distribution function, assuming that Fi,j represents the scale factor of each dimension in the individual. Fi,j is expressed as follows:where Fmj is the position parameter of the Cauchy inverse cumulative distribution function and the scale factor of current individual and the initial value of Fmj is set to 0.5. 0.1 indicates the scale parameter of the Cauchy inverse cumulative distribution function. To better apply to population evolution, the weighting factor c is introduced to combine the parent factor and the next generation factor. The current Fmj is expressed as follows:where and parental scalar factor is calculated using the power mean. The power mean is expressed as follows:where n is the index value of the power mean, which is quantified to the influence of the parent’s scaling factor on the offspring.

In the DE algorithm, the crossover probability CR determines the possibility that an objective individual inherits gene from variant individual . In this paper, the crossover probability CR mainly adopts the normal distribution function, assuming that CRi,j represents the crossover probability of each dimension in the individual. CRi,j is expressed as follows:where CRmj is the mean of individual crossover probability and the initial value is set to 0.5. The standard deviation of normal distribution is set to 0.1. To better inherit the parent gene, a weighting factor c is introduced to combine the parent crossover probability with the next generation crossover probability. CRmj is expressed as follows:where and parental crossover probability SCR,j is calculated using the Lehmer mean. The Lehmer mean is as follows:

The Lehmer mean method can flexibly adjust the value of CR according to the parent cross probability.

3.4. Algorithm Framework

The proposed algorithm combines multiscale strategy and covariance learning and introduces adaptive control parameters to lead the population to keep close to the global optimum. Based on the above analysis, the basic flow of MCDE is summarized as Algorithm 1.

Input: population size NP, dimension D, Maximum number of evaluations MaxFES; the initial scale factor Fmj, the initial crossover probability CRmj; weighting factor c = 0.1, power mean n = 4; initialize the population Pop1, Pop2, and Pop3 according to equation (8); let ;
Begin:
 Evolutionary generation G = 0, current evaluation times FES = 0;
 While FES  MaxFES
 For j = 1  3
  If G > 0
   Calculate Fmj, CRmj according to equations (13), (14), (16), and (17);
  End if
  For
   Calculate the subpopulation , according to equations (12) and (15);
   Perform mutation strategies according to equations (5), (6), and (7), respectively;
   Perform covariance learning according to equations (10) and (11);
  End for
  For
   If
    ;
    SF,j = Fi,j, SCR, j = CRi,j;
   Else
    ;
   End if
   FES = FES + NPj;
  End for
 End for
, G = G + 1;
 According to the equation (9), the proportion of good individuals and the redistribution of subpopulations were counted;
 End while
End
Output: the population of individual minimum objective function.

4. Experimental Results and Analysis

The MCDE is tested on 25 benchmark functions of IEEE CEC 2005. The 25 benchmark functions mainly include unimodal function F1F5, basic multimodal function F6F12, extended multimodal function F13F14, and complex function F15F25. For details, please refer to [45]. In this paper, the parameter setting of the MCDE is as follows: population size NP = 250 and subpopulation ratio  = 0.6,  =  = 0.2. Experimental environment: the operating system is win7 Professional 64 bit, CPU is core i7 (3.40 GHz), RAM is 8 GB, and the compiler is MATLAB R2014b.

4.1. Comparison with Improved DE Algorithm

To verify the performance of the MCDE, it is compared with the six classic DE-improvement algorithms: JADE [24], jDE [21], SaDE [23], EPSDE [17], CoDE [18], CoBiDE [31], and LSHADE [46]. JADE and jDE are representative algorithms and are heavily referenced. SaDE and EPSDE are based on multistrategy improved algorithms. CoDE and CoBiDE are improved algorithms based on population structure. The experimental results of the above seven algorithms are shown in Table 1, where D = 30 and MaxFES = 300000. The form of the numerical values in the table is the mean error ± standard deviation. “” means that the comparison algorithm is obviously better than, worse than, and similar to MCDE. Based on the data given in Table 1 and Figure 2, we can draw the following conclusions: (1) unimodal function F1F5: among the comparison algorithms, JADE and LSHADE have the best effect. Because of the greedy strategy “current-to-pbest/1,” the algorithm can achieve fast convergence and high precision. However, the MCDE multiscale strategy achieves better results than JADE in the accuracy of the benchmark functions F3 (a), F4 (b), and F5 (c). (2) Basic multimodal function F6F12: the best performing algorithm is CoBiDE, which is better than MCDE on benchmark functions F6, F8, F9, and F11. Test results on F7 (d), F10 (e), and F12 (f) are worse than MCDE. Overall, the MCDE is similar to the CoBiDE on this type of benchmark function. (3) Extended multimodal function F13F14 (g): the average error of the seven algorithms is at an order of magnitude, but the effects of JADE, CoDE, and MCDE are slightly better than the other four algorithms. (4) Complex function F15F25 (h): MCDE is significantly better than JADE, jDE, CoDE, CoBiDE and LSHADE and slightly better than SaDE and EPSDE.

Based on the above four conclusions and Wilcoxon’s test, it can be concluded that the MCDE shows significant effects in the four types of benchmark functions. Finally, the experimental results of the MCDE in D = 30 are better than those of JADE, jDE, SaDE, EPSDE, CoDE, CoBiDE, and LSHADE on 13, 13, 15, 18, 10, 13, and 11 benchmark functions, respectively, worse than other comparison algorithms on 3, 3, 5, 6, 5, 4, and 7 benchmark functions, and similar to other comparison algorithms on 9, 9, 5, 1, 10, 8, and 7 benchmark functions.

From Table 2, it can be seen that when Wilcoxon’s detection is at  = 0.05 and  = 0.1, MCDE is more effective than JADE, jDE, SaDE, EPSDE, CoDE, CoBiDE, and LSHADE. According to Friedman’s average ranking (D = 30) in Table 3, MCDE performed well in all types of benchmark functions and achieved the best ranking.

The experimental results of the above seven algorithms are shown in Table 4, where D = 50 and MaxFES = 500000. Based on the data given in Table 4 and Figure 3, we can draw the following conclusions. (1) Unimodal function F1F5 : in D = 50, the effect of MCDE is higher than the other algorithms on F4 (b) and F5 (c). It shows that the search scope of mutation strategy is wider and the parameter-adaptive control performance is better. The test results of F2 and F3 are only inferior to the JADE and LSHADE. (2) Basic multimodal function F6F12 : CoBiDE and LSHADE perform significantly better in this type of benchmark functions, whereas MCDE is slightly better than that on F6 (d), F7 (e), F10 (f), and F12 (g). (3) Extended multimodal function F13F14 (h): the accuracy of MCDE is better than that of the other six algorithms, mainly because the proposed algorithm adopts a multiscale strategy to search for the best results. (4) Complex function F15F25 (i) (j): such functions are the most complex problems on benchmark functions. There are 11 benchmark functions in all. The result of JADE, jDE, SaDE, CoDE, and LSHADE is not significant. The result of EPSDE is not stable, while MCDE, JADE, and CoBiDE are better than them.

Based on the above four conclusions and Wilcoxon’s test, it can be concluded that MCDE shows significant effects in the four types of benchmark functions. Finally, the experimental results of the MCDE in D = 50 are better than the JADE, jDE, SaDE, EPSDE, CoDE, CoBiDE, and LSHADE on 18, 20, 21, 17, 22, 13, and 15 benchmark functions, worse than them on 4, 2, 2, 7, 2, 6, and 7 benchmark functions, and similar to them on 3, 3, 2, 1, 1, 6, and 3 benchmark functions, respectively.

From Table 5, it can be seen that when Wilcoxon’s detection is at  = 0.05, MCDE is more effective than JADE, jDE, SaDE, EPSDE, CoDE, and LSHADE, and the value of CoBiDE is 0.06. At  = 0.1, MCDE has significant differences from other algorithms. Based on Table 6 Friedman average ranking in D = 50, MCDE performed well in all types of benchmark functions and achieved the top ranking.

4.2. Comparison with Related Evolutionary Algorithms

To further evaluate MCDE, it is compared with CLPSO [47], CMA-ES [48], and GL-25 [25, 26]. CLPSO is a local version of the PSO, adopting a new learning strategy mechanism. CMA-ES adopts a covariance matrix adaptive mechanism and is mainly utilized to solve continuous optimization problems. GL-25 is a global and local real-coded genetic algorithm based on a new crossover operator. The experimental results of MCDE, CLPSO, CMA-ES, and GL-25 are shown in Table 7 at D = 30 and MaxFES = 300000. It can be concluded that MCDE has the most prominent effect on the unimodal function (F2F5) and is smaller than the average error of other evolution algorithms. On the basic multimodal function F10 and F12, the result of MCDE is significantly better than other algorithms. On the extended multimodal functions and complex functions, most functions (F14, F16, F17, F21, F23, F24, and F25) have significant effects. Finally, the experimental results of MCDE in D = 30 are better than CLPSO, CMA-ES, and GL on 19, 15, and 21 benchmark functions, worse than those on 2, 5, and 1 benchmark functions, and similar to those on 4, 5, and 3 benchmark functions.

In this paper, the proposed algorithm is further compared with other evolution algorithms. From Table 8, it can be seen that when Wilcoxon’s test detects  = 0.05 and  = 0.1, MCDE’s value is less than 0.05 and 0.1, and the effect is most significant. According to Table 9, the average ranking of Friedman under D = 30 shows that MCDE performs best on benchmark functions.

4.3. Runtime Comparison and Mechanism Comparison

In general, the running time of evolution algorithm contains the operating time of operator and the time of evaluating the fitness function. JADE, jDE, SaDE, EPSDE, CoDE, CoBiDE, and the proposed algorithms were run 25 times independently on 25 benchmark functions, and the average CPU time consumed was recorded. Set the parameters: MaxFES = 300000 and D = 30. To compare the average time, this paper determines the running speed of the algorithm by means of the mean CPU time ratio (AR) between the algorithms. AR > 1 shows that the algorithm runs slower than MCDE, and AR < 1 shows that the algorithm is faster than MCDE.

From the average AR in Table 10, it can be seen that its main range is [0.85, 13.41]. jDE runs at the fastest speed, and EPSDE runs at the slowest speed. The proposed algorithm is ranked third. The proposed algorithm is slower than jDE and JADE because multiscale strategies increase the search range but consume more time in mutation strategies.

By increasing the experiment with and without multigroup mechanism and covariance learning, it can be concluded from Table 11 and Figure 4 that the multiscale mechanism (DE-1) is outstanding in the unimodal function F4 (a). Covariance learning (DE-2) performs significantly in basic multimodal and complex functions with relatively complex structures in F10 (b), F16 (c), and F17 (d). The population structure is a multipopulation mechanism, and each subpopulation combines the corresponding mutation strategy to ensure the individual diversity in the evolutionary process. Then, the covariance learning establishes a proper rotation coordinate system for the crossover operation in the population. At the same time, adaptive control parameters are adopted to balance population survey and algorithm convergence.

5. Conclusions

MCDE introduces multiscale strategies, including local mutation strategies and global mutation strategies, to expand the population search scope. During evolution, the initial coordinate system is properly rotated by the covariance learning matrix to rotate the objective individual and the variant individual. During the covariance learning, the excellent crossover probability CR and the scaling factor F were inherited from the previous generation by the Lehmer mean and the power mean, respectively. The proposed algorithm is compared with JADE, jDE, SaDE, EPSDE, CoDE, CoBiDE, and LSHADE on the CEC 2005 benchmark function, and it can be seen that there are significant effects on the global optimization problem with D = 30 and D = 50. To further verify the algorithm, we compare it with other evolutionary algorithms such as CLPSO, CMA-ES, and GL-25 in D = 30 and discover it works best. In terms of running time, the proposed algorithm is in the upper part of the comparison algorithms. In summary, both the accuracy and the convergence speed have improved, so MCDE can be implemented.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Yongzhao Du and Yuling Fan contributed equally.

Acknowledgments

This study was supported by the Promotion Program for Young and Middle-Aged Teachers in Science and Technology Research of Huaqiao University (no. ZQN-PY518) and the grants from National Natural Science Foundation of China (grant nos. 61605048 and 61603144). This study was also supported in part by the Natural Science Foundation of Fujian Province, China (grant nos. 2015J01256 and 2016J01300), Quanzhou Scientific and Technological Planning Projects of Fujian, China (grant nos. 2015Z120 and 2017G024), and postgraduate research and innovation ability training program funds of Huaqiao University (grant no. 1611422002).