Abstract

In evolutionary algorithms, genetic operators iteratively generate new offspring which constitute a potentially valuable set of search history. To boost the performance of offspring generation in the real-coded genetic algorithm (RCGA), in this paper, we propose to exploit the search history cached so far in an online style during the iteration. Specifically, survivor individuals over the past few generations are collected and stored in the archive to form the search history. We introduce a simple yet effective crossover model driven by the search history (abbreviated as SHX). In particular, the search history is clustered, and each cluster is assigned a score for SHX. In essence, the proposed SHX is a data-driven method which exploits the search history to perform offspring selection after the offspring generation. Since no additional fitness evaluations are needed, SHX is favorable for the tasks with limited budget or expensive fitness evaluations. We experimentally verify the effectiveness of SHX over 15 benchmark functions. Quantitative results show that our SHX can significantly enhance the performance of RCGA, in terms of both accuracy and convergence speed. Also, the induced additional runtime is negligible compared to the total processing time.

1. Introduction

Evolutionary algorithms (EAs) have been shown to be generic and effective to search for global optima in the complex search space theoretically [13] and practically [46]. The exploration process of EAs imitates the natural selection process, which is realized by conducting the offspring generation and survivor individual selection alternately and iteratively. The population quality is gradually improved throughout the exploration process, which can be viewed as a stochastic population-based generation-and-test process. Because of the offspring generation, a large number of candidate solutions (i.e., individuals) are sampled, accompanied by corresponding fitness values, genetic information, and genealogy information. Such accumulated search data constitute search history which can be very informative and valuable for boosting the overall performance. For instance, exploiting search history can be useful for improving the search procedure under a limited budget of fitness evaluations (FEs). That is, no additional FEs are allowed for improving the search performance. Also, the computational cost of a single FE can be high when the fitness functions are complicated. To enable a better solution for the population without increasing the number of FEs, the way of exploiting the search history truly matters. Nevertheless, search history has been sparsely exploited and studied in existing methods.

Real-coded genetic algorithm (RCGA) has been widely studied in the past decades [711], and the main efforts for improving the performance of RCGA have been focused on the development of the crossover techniques [12]. Because the crossover operator is to generate new offspring from the current population, the quality of the new solutions straightforwardly affects the evolution direction and convergence speed. Given different mechanisms, crossover methods can differ from (1) parent selection, (2) offspring generation, and (3) offspring selection. Both parent and offspring can be more than two, depending on the design. The abovementioned three aspects associate the exploration ability with exploitation ability, and the degree and balance between both abilities affect the performance largely [13]. Although the self-adaptive feature of RCGA [14] can adjust the relationship to a certain extent, the “best” degrees and balance between exploration and exploitation for achieving a satisfactory solution can differ greatly with respect to different problem settings and can be hardly achieved with the adaptive feature.

With a large amount of search history data up to the current generation in hand, we attempt to introduce a crossover method that effectively exploits the history data in this paper. At first, an archive is defined to collect the survivor individuals over generations as the search history. Then, the stored individuals are clustered by k-means [15], and each cluster is assigned a score depending on the number of belonging individuals. At last, offspring is generated and selected according to the scores. We introduce two different schemes to update the archive. The proposed crossover operator, named search history-driven crossover (SHX), generates offspring by considering the cluster scores. Since SHX enables an offspring selection mechanism, any existing parent selection and offspring generation mechanisms can be easily integrated with it. To our knowledge, this is the first work to design the crossover model by effectively exploiting search history. We present a set of experiments to systemically evaluate the effectiveness of the proposed method using 15 benchmark functions. Three conventional crossover operators are employed, and the results with/without SHX are compared. Apart from the above, two archive update methods are also analyzed.

The main technical contributions of this paper are threefold. First, we propose a novel crossover model by effectively exploiting the search history. Second, we introduce the offspring selection based on the clusters calculated from the search history. Third, we introduce two schemes to update the survivor archive. A preliminary version of this paper appears in GECCO2020 [16].

Crossover is one of the principal operators for generating offspring and deeply relates to the performance of the real-coded genetic algorithm (RCGA). Blend- crossover (BLX-) [17] proposed by Eshelman and Schaffer is one of the most popular operators. Offspring genes are independently and uniformly sampled within an interval between a gene pair of parents. The parameter corresponds to the extension of the sampling interval, which plays a key role in maintaining the diversity of offspring. Eshelman et al. proposed Blend-- crossover (BLX--) [18] which involves two extension parameters. Deb and Agrawal introduced simulated binary crossover (SBX) [19] which simulates the single-point crossover in binary-coded GA for continuous search space. The interval used in SBX is determined by a polynomial probability distribution depending on the distribution index . indirectly adjusts the tendency of offspring generation. The above crossover operators have a common feature that the offspring genes are extracted according to a certain probability distribution from the predefined interval on the parent genes. This feature enables better results than using crossover operators for binary coding in the continuous search space. On the other hand, some crossover operators set more than two individuals as parents, which aim to generate offspring with well-preserved population statistics. In the case of unimodal normal distribution crossover (UNDX) [20], the generation of offspring follows a unimodal normal distribution defined on the line connecting two of the three parents. For simplex crossover (SPX) [21], individuals are taken as parents in the -dimensional search space. SPX uniformly generates offspring within -dimensional simplex constructed by parent individuals and expanded by a parameter .

Search history has also been exploited in some research, but to the best of our knowledge, none of them is for the purpose of improving the crossover model. Since online real systems often provide uncertain evaluation values which lead to unreliable convergence of GA, Sano and Kita proposed memory-based fitness estimation GA (MFEGA) [22]. MFEGA estimates the fitness from neighboring individuals stored in the search history. Leveraging search history allows estimation without requiring additional evaluation. Amor and Rettinger proposed GA using self-organizing maps (GASOM) [23]. SOM (self-organizing maps) can provide a visualized search history, which makes the regions explored intuitive for users. Moreover, individual novelty is introduced by the activation frequency in the search history table and utilized by the reseeding operator to preserve the exploration power. Yuen and Chow presented the continuous nonrevisiting GA (cNrGA) [24]. A binary partitioning tree called a density tree stores all evaluated individuals and divides the search space into nonoverlapped partitions by means of distributions. These subregions are used to check whether a new individual needs to be evaluated or not.

3. Overview

Principles of designing good crossover operators for RCGA are discussed in [25]. Two among them are especially important: (1) the crossover operator should preserve the statistics of the population; (2) the crossover operator should generate offspring with as much diversity as possible under the constraint of (1). By following these suggestions, the key idea of SHX is to cluster the search history and select population members from excessively generated candidate solutions by preserving the statistics represented by the clusters. Figure 1 illustrates the overview of our SHX. The proposed method is performed under the framework of RCGA which mainly involves survivor selection and crossover. Mutation is optional, but we exclude it to clearly investigate the effectiveness of SHX in this work.

The proposed method is described in Algorithm 1. Population is denoted by which comprises individuals, and the population at the -th generation is denoted as . Similarly, parents for SHX, excessively generated candidate solutions during SHX, offspring after SHX, and survivors for the next generation are represented by , , , and , respectively. The size of each set is denoted using with a subscript of the set name (e.g., the size of parents is denoted by ). In addition to , our method manages an archive which preserves survivors throughout the generation alternation. and are initialized by randomly placing individuals in the search space. The archive update process is conducted after the survivor selection. Survivor individuals of the current generation are aggregated into both and of the next generation. SHX can be further divided into parent selection, offspring generation, and offspring selection. Different from conventional RCGA, individuals generated from are regarded as offspring candidates . The main purpose of SHX is to narrow down to individuals denoted by according to the statistics provided by . is calculated from the clustering result of the archive and immediately impacts the offspring selection.

Input: population , population size ,
Archive , archive size ,
Score ,
Generation ID
Output: estimated global optimum
//initialization
(1) individuals in are randomly initialized
//fitness evaluation
(2)eval
//archive update
(3) archiveUpdate ;
(4)While termination criterion is not satisfied do
(5)
//SHX
(6) parentSelection , ;
(7)
(8) offspringGeneration ;
(9) offspringSelection , ;
//fitness evaluation
(10)eval
//survivor selection
(11) survivorSelection ;
(12)
//archive update
(13) archiveUpdate ;
(14)end
(15)Return .

SHX can adopt any existing crossover operators (e.g., BLX- [17] and SPX [21]) for the offspringGeneration function (Algorithm 1, line 8) to generate from . For the parentSelection function (Algorithm 1, line 6) and the survivorSelection function (Algorithm 1, line 11), the just generation gap (JGG) [26, 27] is employed in this work. That is, the parentSelection function randomly extracts individuals from as , and the survivorSelection function selects top- individuals in as according to the fitness value. To show the performance increase brought by SHX, we choose the widely applied BLX-, SPX, and UNDX for the offspring generation and compare the results in Section 6. We explain archiveUpdate (Algorithm 1, lines 3 and 13) and offspringSelection (Algorithm 1, line 9) in detail in Section 4 and Section 5, respectively.

4. Survivor Archive

Since the genetic operations are run alternately and iteratively, collecting and analyzing the history data may be beneficial for boosting performance. Given that SHX is to maintain the historical statistics while producing offspring for the next generation, the archive is designed to store over few past generations and extracts statistics . The calculation of is based on the k-means, which is an off-the-shelf nonsupervised clustering method. The pseudocode of k-means is shown in Algorithm 2. In particular, k-means is employed to cluster the individuals in based on their position in the search space, and is a normalized frequency histogram to show the proportion of each cluster size to . A higher score indicates that the corresponding cluster is more likely to be a promising search region. The statistics can then be maintained by probabilistically assigning newly generated candidates to each cluster according to .

  Input: number of clusters ,
  Data points
  Output: cluster centroids
(1) cluster centroids are randomly initialized
(2)While termination criterion is not satisfied do
(3)For do
(4)assign the nearest cluster centroid ID to
(5)end
(6)For do
(7)update by calculating the mean of data points in the -th cluster
(8)end
(9)end
(10)Return

To keep the computational cost brought by k-means within an acceptable and constant range, the archive size is fixed to . That is, a part of individuals in must be replaced with new survivors during the archive update to incorporate new information. Two types of update methods are considered in this work: (1) randomly selecting individuals in and replacing them with (denoted by ); (2) replacing a part of with in the order in which the individuals of arrived (denoted by ). The performance comparison between these two approaches is discussed in Section 6.

The update of and calculation of are executed in the function archiveUpdate (Algorithm 1, lines 3 and 13) which is summarized in Algorithm 3. At the replacement step (Algorithm 3, line 4), individuals are discarded from based on or approaches, and new are stored to . Initialization is executed when equals 0. The k-meansFit function (Algorithm 3, line 7) updates the centroids of the clusters according to the updated and assigns updated cluster labels to each individual in . After that, the normalized frequency histogram for each cluster is calculated by the hist function (Algorithm 3, line 9) for further usage in offspring selection (Algorithm 4). Note that the initial centroids of the clusters in the current generation are inherited from the previous generation, as most individuals in are the same as .

Function: archive Update .
Input: archive .
 Survivors .
 Size of the archive .
Output: updated archive ,
 Score .
(1) If then.
 //initialization.
(2) individuals in are randomly initialized.
(3) Else.
//archive update.
(4) randomly or sequentially (first in first out) select individuals.
 from to form .
(5).
(6)end.
//score update.
(7)k-meansFit .
(8); Calculating frequency histogram.
(9)Return .
Function: offspringSelection .
Input: candidates ,
 Score
Output: offspring
//labeling based on clustering results estimated in Algorithm 3, line 7
(1)k-meansPredict ;
//roulette construction
(2)For do
(3)If then
(4)
(5)Else
(6)
(7)end
(8)end
//offspring selection
(9)Repeat
(10)select one cluster ID by the roulette selection based on
(11)randomly select one candidate ,
(12)
//exclude selected candidate from clusters
(13)
(14)If then.
(15):
(16)end.
(17) times are run.
(18)Return .

5. Search History-Driven Crossover (SHX)

SHX randomly selects parents by following the strategy of existing crossover operators (e.g., two parents in the case of BLX and parents in the case of SPX) and excessively generates candidate offspring for further offspring selection. because must ensure a sufficient number of individuals that can be assigned to each cluster in . Here, generating individuals excessively can also be considered as a mechanism of diversity preservation. It is worth pointing out that the offspring selection is a different procedure from the survivor selection. Offspring selection belongs to the crossover model and is conducted before fitness evaluation. Survivor selection is conducted after fitness evaluation. Offspring selection narrows down to based on roulette wheel selection [28]. Each proportion of the wheel relates to each possible selection (i.e., clusters), and is used to associate a probability of selection with each cluster in . This can also be viewed as a procedure that SHX preferentially selects individuals in more “promising” regions. This bias selection can encourage the evolution of the population and accelerate the whole convergence. Besides, the statistics of the population (e.g., cluster size) can be maintained between two consecutive generations because the new generation is sampled based on the statistics of the history. Also, the diversity of can be preserved because each newly generated individual from has a probability to be assigned to.

The algorithm of offspring selection is shown in Algorithm 4. Input is excessively generated by existing crossover operators (Algorithm 1, line 8). Each candidate is labeled by the k-means Predict function (Algorithm 4, line 1) based on the current clusters estimated from . Then, the roulette is constructed based on . The roulette selection is called times, yielding selected offspring. Each time of roulette selection produces a cluster ID, and one candidate in that belongs to the corresponding cluster is randomly selected and assigned to . To avoid duplicate selection, a selected candidate will be excluded from . If no more candidates correspond to a certain cluster (this is rarely the case by assuming ), the roulette is reconstructed by eliminating the proportion of the corresponding cluster. Finally, is passed to the survivor selection process which determines using JGG.

6. Experimental Results

The performance of SHX is investigated over 15 benchmark functions, with each function in two different dimension settings. We comprehensively compare the performance of RCGA with/without SHX, and SHX is run with different settings of archive update methods (/) and offspring generation methods (BLX [17]/SPX [21]/UNDX [20]).

6.1. Experimental Setup

Benchmark functions are a useful tool to verify the effectiveness of a method, and it is general to use several functions with different properties, such as in [29, 30]. We selected 15 benchmark functions with different characteristics from the literature [3133] for evaluation. Detailed information of each function is summarized in Table 1. Initialization of the population and the archive is conducted within the range provided by the 4th column in Table 1. It is worth mentioning that the searching space (i.e., range of parameters) during the generation alternation is not constrained. Each function is labeled according to different combination of characteristics (U + S, U + NS, M + S, and M + NS). By involving various characteristics of functions, we can analyze the proposed method more comprehensively and objectively. Furthermore, as all selected functions are adjustable in the setting of dimension, we adopt two different numbers of dimensions ( and ) to control the difficulty degree of the search problem.

The setting of hyperparameters of the proposed method is listed in Table 2. The proposed method includes hyperparameters of not only RCGA (number of generations, , and ) but also SHX (, , and ). Basically, the search problem defined by each function becomes more hard as the number of dimensions increases, which requires a lot of evaluations. For adaptive adjustment, the number of generations, , and are set proportional to the number of dimensions. The constant values of each parameter are empirically determined because the purpose of the experiments is to validate the effectiveness of having SHX, rather than achieving the best solution for each function.

All experiments are executed 100 times with different random seeds. In each experiment, the generation alternation completely executed the number of generation times defined in Table 2. For a fair comparison, iterations under the same random seed start using the same population. The runtime and fitness are recorded with Python implementation (without either parallelization or optimization) on a i7-7700 CPU at 3.60 GHz, 12.0 GB RAM desktop computer.

6.2. Comparison in the Final-Generation-Elite

The results of the absolute error between the optimal value and the final-generation-elite fitness with respect to all combinations of functions, dimension, and methods are displayed in Table 3. Table 3 shows the minimum, maximum, median, mean, standard deviation (SD), and value of the Mann–Whitney U test by each combination. The Mann–Whitney U test evaluates the significance of SHX results against results without SHX under the significance level . Before showing the superiority by involving SHX, we first exclude a few results that all the methods are trapped by local optima or cannot reach the global optima. (1) Easom Function. This function has several local minima. It is unimodal, and the global minimum only has a small area corresponding to the search space, which can be hardly arrived at. (2) Schwefel 2.26. Since the setup of this experiment does not restrict the range of parameters during search, an extremely small fitness value (even smaller than the global optimum) can be achieved with this function, which is not suitable for comparisons.

From Table 3, we can observe the clear improvement of performance brought by SHX. The results of the value show that the methods with SHX have recognized the significance at least in 23 settings among all 30 settings. In the other five results (minimum, maximum, median, mean, and SD), the methods without SHX cannot achieve outperformed results for most settings. For instance, focusing on the minimum results, the methods without SHX outperform the methods with SHX only 5, 0, and 4 times by BLX, SPX, and UNDX, respectively. On the other hand, SHX with sequential archive update achieves the best performance. SH-BLX_sequential, SH-SPX_sequential, and SH-UNDX_sequential show the significance in 27, 26, and 27 settings, respectively. In addition, they achieve the best results in most settings with respect to the maximum, median, and mean results. One possible reason for outperforming in most cases is that removes the oldest individual which arrived first, and therefore SHX can select offspring according to the up-to-date search history to reflect the trend of evolution more sensitively. In contrast, uniformly removes individuals in the archive, which may impede the discovery of new solutions since old individuals may be retained for more generations in the archive.

6.2.1. Analysis on BLX vs. SH-BLX

It has been already known that the standard BLX [17] faces difficulties especially when the target function is nonseparable [34] due to the parameter-wise sampling. By observing the results of to and to from Table 3, we can find that involving SHX significantly improves the performance, which indicates that SHX can help BLX to greatly mitigate this drawback. It is easy to understand because offspring selection with clusters embeds distance measure which builds the relationship among parameters.

6.2.2. Analysis on SPX vs. SH-SPX

SPX [21] is a better alternative of BLX, and we can observe from Table 3 that SPX noticeably outperforms BLX. From Table 3, it is also very clear that SHX further boosts the performance of SPX to a large extent. In particular, the results of minimum and median are improved by involving SHX for all settings. As pointed out in [21], SPX has the ability to maintain the mean and covariance of the parent individuals, which is consistent with the design guideline of good crossover operators mentioned in Section 3. Since SHX manages an archive that stores search history over few generations, it can preserve some useful statistics (e.g., centroids of clusters) much longer. That is why SHX is able to enhance SPX.

6.2.3. Analysis on UNDX vs. SH-UNDX

Similar to BLX and SPX, Table 3 shows that the results involving SHX are improved in most settings. UNDX is also designed to generate offspring inheriting the distribution of the parent individuals [35]. Therefore, statistics of the search history provided by SHX are useful for UNDX to enhance search ability.

6.3. Comparison in Convergence Curve

With the aid of search history, SHX not only achieves better results but also improves the convergence speed. In this section, we compare the generation alternation for over all the test functions in the case of . Evaluation values of elite individuals from the 1st generation to the 100th generation are plotted in Figure 2. The mean value of 100 trials is represented by the line, and the range between the minimum and the maximum is represented by the shaded area. Smaller area means more stable search. It should be noted that as the ranges of parameters are not constrained during the search procedure, methods can achieve infinitely small values of fitness, and a lower value does not mean a better result in the case of , as explained in Section 6.2.

For BLX, SPX, and UNDX, exploiting SHX shows faster convergence speed comparing against them without SHX in most cases. The superiority becomes more obvious when the problem setting is more difficult (e.g., multimodal functions vs. unimodal functions ).

6.4. Comparison in Processing Time

In this section, we show the runtime overhead of the processing brought by SHX. Figure 3 shows the comparisons in processing time of an optimization task ( and a single fitness evaluation takes 0.01 second) for BLX and SPX. The parameter setting follows Table 2, and all the results are averaged over 10 trials. It took 93.9 seconds and 94.1 seconds for BLX and SPX to complete the entire process, respectively. SH-BLX_random took additional 1.7 seconds to BLX. SH-BLX_sequential took 1.6 seconds more than BLX. Similarly, the additional runtime for SH-SPX_random and SH-SPX_sequential to SPX were 3.9 seconds and 3.9 seconds, respectively. These numbers demonstrate the additional runtime only occupies a small part of the total processing time. These additional computational costs mainly occur in the clustering with archive data and the label assignment with candidate offspring. The cost can be further reduced by fusing efficient distance measure or parallel computing. For a fixed size of an archive, the runtime grows linearly with the increase in the number of generations. Considering the complexity of the fitness function and the budget, SHX is a practical alternative to other crossover models.

7. Conclusions

In this paper, we have proposed a novel crossover model (SHX) which is simple yet effective and efficient. It can be easily integrated with any existing crossover operators. The key idea is to exploit search history over generations to gain useful information for generating offspring. Experimental results demonstrate that our SHX can significantly boost the performance of existing crossovers, in terms of the final solution and the convergence speed. Also, according to experiments, the induced extra runtime is negligible compared to the total processing time.

SHX still has a few limitations. (1) Additional hyperparameters need to be determined. (2) The induced additional runtime may be unable to sufficiently support applications which require high processing speed. As the future work, we would like to address the above limitations. For instance, hyperparameters can be adaptively set by considering specific contexts, and parallelization can be introduced to speed up SHX.

Data Availability

The test data used to support the findings of this study are included within the article.

Disclosure

A preliminary version of this work appears in GECCO2020 and has also been mentioned in the manuscript which can be viewed at the following link: https://arxiv.org/abs/2003.13508.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was partly supported by JSPS KAKENHI, Grant number (JP18K17823).