Next Article in Journal
An RFM Model Customizable to Product Catalogues and Marketing Criteria Using Fuzzy Linguistic Models: Case Study of a Retail Business
Next Article in Special Issue
A Framework for Economically Optimal Operation of Explosive Waste Incineration Process to Reduce NOx Emission Concentration
Previous Article in Journal
A New Flexible Family of Continuous Distributions: The Additive Odd-G Family
Previous Article in Special Issue
Distributed State Estimation Based Distributed Model Predictive Control
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Q-Learnheuristics: Towards Data-Driven Balanced Metaheuristics

1
Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile
2
Departamento de Ciencias de la Computación, Escuela Politécnica Superior, Universidad de Alcalá, 28805 Alcalá de Henares, Spain
3
Departamento de Física y Matemáticas, Facultad de Ciencias, Universidad de Alcalá, 28802 Alcalá de Henares, Spain
4
Escuela de Ingeniería en Construcción, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2147, Valparaíso 2362804, Chile
5
Escuela de Negocios Internacionales, Universidad de Valparaíso, Alcalde Prieto Nieto 452, Viña del Mar 2572048, Chile
6
Departamento de Informática, Universidad Técnica Federico Santa María, Avenida España 1680, Valparaíso 2390123, Chile
7
Escuela de Computación e Informática, Universidad Bernardo O’Higgins, Av. Viel 1497, Santiago 8370993, Chile
*
Authors to whom correspondence should be addressed.
Mathematics 2021, 9(16), 1839; https://doi.org/10.3390/math9161839
Submission received: 10 May 2021 / Revised: 27 July 2021 / Accepted: 29 July 2021 / Published: 4 August 2021
(This article belongs to the Special Issue Mathematics and Engineering II)

Abstract

:
One of the central issues that must be resolved for a metaheuristic optimization process to work well is the dilemma of the balance between exploration and exploitation. The metaheuristics (MH) that achieved this balance can be called balanced MH, where a Q-Learning (QL) integration framework was proposed for the selection of metaheuristic operators conducive to this balance, particularly the selection of binarization schemes when a continuous metaheuristic solves binary combinatorial problems. In this work the use of this framework is extended to other recent metaheuristics, demonstrating that the integration of QL in the selection of operators improves the exploration-exploitation balance. Specifically, the Whale Optimization Algorithm and the Sine-Cosine Algorithm are tested by solving the Set Covering Problem, showing statistical improvements in this balance and in the quality of the solutions.

1. Introduction

In approximate methods, the guarantee of finding global optimum solutions is sacrificed due to the computational complexity of hard optimization problems. Approximate algorithms can be classified as specific heuristics and MH. Heuristics are techniques specifically designed to solve a particular problem. On the contrary, MH are defined as upper-level general methodologies (templates), which can be used as guiding strategies for the design of underlying heuristics for solving a problem [1]. Then, MH extends basic heuristic methods by including them in an iterative framework, augmenting their exploration and exploitation capabilities. Notice that exploration is the process of visiting new regions of a search space (solutions), whereas exploitation is the process of visiting those regions within the neighborhood of previously visited solutions. Thus, MH needs to establish a good ratio between exploration and exploitation to be successful. That means that designing and applying good MH is to make a proper trade-off between these two “forces” [2,3]. Unfortunately, the proper handling of this trade-off is an open question in the literature [4,5,6,7,8,9]. A comprehensive review about MH can be found in [1,10].
Optimization problems can be classified depending on the domain of the decision variables in discrete and continuous problems. In recent years, discrete optimization problems have become more and more frequent in the industry with problems as Set Covering Problem (SCP) [11,12], Knapsack Problem [13], Software Project Scheduling Problem [14,15] and Feature Selection [16]. The No-Free-Lunch Theorem (NFLT) [17] tells us that there is no universal optimization algorithm for all existing optimization problems. This means that, despite the existence of algorithms designed to solve discrete problems, none of them are good for all combinatorial optimization problems and there will always be a better one for a specific problem. This best algorithm can be one designed for discrete problems as well as an algorithm designed for continuous problems adapted to discrete problems. However, there are many MH (most of them are swarm intelligence) designed to work in the continuous domain, meaning that binarization techniques are required [18]. Among the most commonly used binarization operators, two-step techniques are the most common [18] and its performance has been improved by the use of ml-based techniques [13,19].
Several variations of MH are proposed in the literature to improve MH algorithms. Among the most relevant trends, hybrid MH represent a class of algorithms that combine MH with other applicable algorithms. The resulting algorithms take advantage of the strengths of algorithms composing the hybridization, finding better results while keeping complexity low. According to the taxonomy defined in [1] for hybrid MH, MH are usually combined with MH with exact mathematical programming algorithms (resulting in matheuristics [20]), with simulation (resulting in simheuristics [21]) and with machine learning (ML) (resulting in learnheuristics [22,23,24]). This work focuses on proposing a learnheuristic.
Focusing on ML, these techniques have become popular in recent years with many applications, from industrial to everyday applications [25,26]. Usually, ML techniques are divided in supervised, unsupervised, semi-supervised and Reinforcement Learning. Supervised techniques learn from labeled data to infer future samples in the form of classification or regression [27]. Unsupervised techniques consider unlabeled data to find clusters or patterns with usual algorithms as k-means [28], dbscan [29] or BIRCH [30]. Semi-supervised techniques share features of supervised and unsupervised learning, resulting in a hybrid approach in which labeled data are managed in a supervised manner while unlabeled data are managed in an unsupervised manner [31]. Reinforcement Learning techniques are based on the notion of cumulative reward, where the system received a positive or negative reward after each decision, adjusting the behavior of the system according to this feedback. Thus, instead of learning from labeled data, as supervised techniques, the system learns from the experience of making decisions [32].
Based on the existing difficulties in applying binarization techniques in MH algorithms designed for solving continuous problems, the main contribution in this paper consists in a novel binarization scheme powered by the QL algorithm, which is a Reinforcement-Learning technique. The authors apply this binarization scheme in conjunction with two MH to solve the SCP, resulting in two hybrid MH. As a result, the authors verified that the hybrid approach including the novel binarization scheme outperforms the regular MH with a usual fixed binarization approach from the literature.
The reminder of this paper is organized as follows. Section 2 presents the related work. In Section 3, the classical binarizarion techniques are described. Section 4 describes the QL based binarization scheme proposed in this paper. Section 5 explains the two implemented MH. Section 6 presents SCP. Section 7 details the statistical methodology and the experimental results. Finally, Section 8 concludes the work with some final remarks.

2. Related Work

This section discusses related work along two lines. First, works in which metaheuristics are improved by applying ML techniques and, second, works in which Q-Learning is applied together with MH, resulting in hybrid MH.

2.1. Metaheuristics Enhanced by Machine Learning

Regarding ML techniques supporting MH, the work of García et al. [19] reviewed two lines of research. The first one consists in the integration of ML as a replacement for an operator (e.g., population management, solution initialization, local search, disturbance of solutions and parameter tuning, among others). The second one consists in the use of ML as a selection tool for a set of MH, choosing the most appropriate for solving an specific instance of a problem.
Regarding the first line, the authors may cite the work of Veček et al. [33], where they performed a parameter tuning for the chess rating system problem. In the work of Ries et al. [34], a similar implementation was performed but using fuzzy logic and Decision Trees. Deng et al. [35] proposed a clustering-based initial solution generation for the Traveling Salesman Problem. Within this line, the binarization operator was also of considerable interest. Some examples are observed in [13] by solving the Multidimensional Knapsack Problem by unsupervised learning techniques, in [36] by using the concept of percentile and in [37] where they implemented the framework of Apache Spark for large combinatorial problems.
When considering ML as a selection tool to obtain a MH among a set of these, this task is divided into three groups. The first one is the algorithm selection that chooses among a set of techniques and characteristics of each problem in order to obtain better performance for a set of similar instances [38]. The second one is the hyperheuristic strategies, which aims at automating the design of the MH to address a set of problems [39]. The third group is composed of cooperative strategies, which combine algorithms in a sequential and parallel manner to improve robustness, sharing a part of the solution or its totality [40].
Reinforcement Learning is often used for the intelligent selection of operators [41]. An example of this is the work of Zhang et al. [42] where they proposed an adaptive evolutionary programming algorithm. For each individual, an optimal mutation operator was selected based on immediate performance.

2.2. Metaheuristics Enhanced by Q-Learning

Reviewing the literature, it is possible to find various implementations of QL supporting MH to solve a wide range of problems. Among the first works are those made by Gambardella and Dorigo, where they incorporate QL replacing pheromone behavior in Ant Colony Optimization [43] to solve Travelling Salesman Problem [44] and its asymmetric version. In [45], QL is used as an heuristic selector inside an hyperheuristic scheme to solve the Cutting Stock Problem. The same strategy is used in [46,47] in order to solve the cross-domain heuristic search challenge [48] and Stochastic mixed-model assembly line sequencing problem. In [42,49], a hybridization is proposed where QL is an intelligent selector of mutation operators in Genetic Algorithms. In [50], QL and SARSA are implemented for League Championship Algorithm to strengthen the search power of each individual in the algorithm to extract better stock trading rules for various types of trading conditions. In [51,52], parameters of the Firefly Algorithm [53] and a version Sine Cosine Algorithm combined with Cuckoo Search are dynamically controlled to improve the search and to avoid falling into local optima. There are also several implementations in the literature where QL supports MH, such as Particle Swarm Optimization [54,55], Cuckoo Search [56], Bat Algorithm [57] and Meta-RaPS [58].
Summarizing what has been found in the literature, two models of conceptual interaction between QL and a MH are shown and these two major interest groups are: (1) how QL can support within the MH; and (2) how QL can support from outside the MH (with a hyperheuristic approach).
The interaction model shows that QL can support learning from the behavior of operators associated with the MH for the problem in the search of solutions, i.e., it is an additional element to the layer of the MH.
On the other hand, QL can be a component of a higher-level layer called Hyperheuristics (HH), which is given back to operators at a lower level with each iteration of the MH addressing the problem. The operators from the lower level problem form part of the problem to be solved at the top layer where any MH learning from QL can take place.
The literature review shows the contribution of ML works to improve the performance of metaheuristics as well as the contribution of metaheuristics to improve the performance of ML. However, in an essential topic in the behavior of a MH, such as the exploration-exploitation balance, there is an area of research that is interesting to work on in order to find better solutions that are applicable to the industry.

3. Continuous Metaheuristics Working in Binary Domains

The binary techniques for continuous MH consist of transferring the values of continuous domain of the MH to a binary domain, this is conducted to preserve the quality movements that possess continuous MH and, thus, generate quality binary solutions.
Although there are MH that work in binary domains without the need to incorporate a binary scheme, the continuous MH together with a binary scheme have presented great performance in diverse combinatorial NP-Hard problems for which they have called the interest of the scientific community. Some examples include Particle Swarm Optimization [59], Binary Salp Sawrm Algorithm [60], Binary Dragonfly [61] and Binary Magnetic Optimization Algorithm [62], among others [63,64,65,66].
Among the binary schemes, two large groups can be defined: First, the operators that do not alter the operation of the other elements of the MH where the two-step techniques stand out, as they are the most used previously [18] and the Angle Modulation technique [67]. The second group consists of the methods that alter the normal functioning of MH, which are Quantum Binary [68] and Set-Based Approaches in addition to the techniques based on clustering [13,19].

3.1. Two-Step Binarization Scheme

Two-step binary schemes are of great relevance for various types of problems [69]. This binarization scheme is composed by two steps, the first one is the transfer function, which transfers the values generated by the continuous MH to a continuous interval between 0 and 1, while the second step is the binarization, which consists in transferring the real number in a binary value. This is best exemplified in Figure 1.

3.1.1. Transfer Functions

Transfer functions were introduced by Kennedy et al. in 1997 [70]. Having as main advantage the delivery of a probability between 0 and 1 at a low computational cost. There are two types of functions, the S-Shaped [63] and the V-Shaped [71] where each has four variations presented for both the S-Shape and for the V-Shape. The variations for the S-Shape functions, T S i ( d w j ) for i { 1 , 2 , 3 , 4 } are defined as follows:
T S 1 ( d w j ) = 1 1 + e 2 d w j ,
T S 2 ( d w j ) = 1 1 + e d w j ,
T S 3 ( d w j ) = 1 1 + e d w j 2 ,
and
T S 4 ( d w j ) = 1 1 + e d w j 3 ,
where d w j denotes the discrete value in the individual w { 1 , 2 , , n } in dimension j { 1 , 2 , , l } and n and l are the number of individuals and dimensions, respectively. The variations for the V-Shape functions, T V i ( d w j ) for i = 1 , 2 , 3 , 4 , are defined as follows:
T V 1 ( d w j ) = e r f π 2 d w j ,
T V 2 ( d w j ) = t a n h ( d w j ) ,
T V 3 ( d w j ) = d w j 1 + ( d w j ) 2 ,
and
T V 4 ( d w j ) = 2 π a r c t a n π 2 d w j .
The results of plotting the tuples ( d w j , T S i ( d w j ) ) and ( d w j , T V i ( d w j ) ) for i { 1 , 2 , 3 , 4 } are shown in Figure 2.

3.1.2. Binarization

The second step is binarization, which has the function of discretizing the probability obtained from the transfer function T ( d w j ) calculated according to Section 3.1.1 by delivering a binary value.
For this step, there are different techniques in the literature such as Standard, Complement, Static Probability, Elitist and Elitist Roulette. The values for the binarization obtained by using different methods are defined as follows.
Let X n e w , s d j be the binarization value obtained by using the Standard method. Then, X n e w , s d j is defined as the following:
X n e w , s d j = 1 if   r a n d T ( d w j ) 0 otherwise ,
where r a n d , for 0 r a n d 1 , is a random value generated following a uniform distribution U ( 0 , 1 ) , T ( d w j ) denotes the discrete value in the individual w { 1 , 2 , , n } in dimension j { 1 , 2 , , l } and n and l are the number of individuals and dimensions, respectively.
Let X n e w , c j be the binarization value obtained by using the Complement method. Then, X n e w , c j is defined as the following:
X n e w , c j = C o m p l e m e n t ( X w j ) if   r a n d T ( d w j ) 0 otherwise ,
where C o m p l e m e n t ( X w j ) denotes the complementary binary value of X w j in the individual w for dimension j, i.e., if the value of X w j is 0, the complement corresponds to 1.
Let X n e w , s p j be the binarization value obtained by using the Static Probability method. Then, X n e w , s p j is defined as the following:
X n e w , s p j = 0 if   T ( d w j ) α X w j if   α < T ( d w j ) 1 2 ( 1 + α ) 1 if   T ( d w j ) > 1 2 ( 1 + α ) ,
where X w j denotes the value of the dimension j { 1 , 2 , , l } in the individual w { 1 , 2 , , n } , l and n are the number of dimensions and individuals, respectively, and α corresponds to a parameter determined by the user.
Let X n e w , e j be the binarization value obtained by using the Elitist method. Then, X n e w , e j is defined as the following:
X n e w , e j = X B e s t j if   r a n d < T ( d w j ) 0 otherwise ,
where X B e s t j denotes the value of the dimension j in the individual b e s t , which has obtained the best fitness so far, and l and n are the number of dimensions and individuals, respectively.
Let X n e w , e r j be the binarization value obtained by using the Elitist Roulette method [72]. Then, X n e w , e r j is defined as follows.
X n e w , e r j = P [ X n e w , e r j = ζ j ] = f ( ζ j ) δ Q g f ( δ ) if   r a n d < T ( d w j ) P [ X n e w , e r j = 0 ] = 1 otherwise ,

4. Binarization Scheme Selector Proposal

Nowadays, combinatorial problems in the binary domain are becoming increasingly complex and frequent in the industry. Solving them in reasonable a reasonable time period with high quality solutions is a priority for both academia and the industry.
This work proposes an intelligent selector of binarization schemes where the existing binarization methods in the literature are integrated to control the exploration and exploitation balance, thus, avoiding local optimizations. This is because several authors [5,8,73,74,75] propose that for a metaheuristic to work well, it must have a good balance between exploration and exploitation. Exploration or diversification consists of visiting unexplored regions of the search space to ensure that the search is performed in biased regions [1]. In contrast, exploitation or intensification and promising regions with good solutions are further explored in hopes of finding better solutions [1].
This new binarization strategy is inspired by the behavior of hyperheuristics and techniques that have performed well for various types of problems [39,76,77,78]. This method consists in using an intelligent operator to determine which type of binarization is most appropriate at the iteration level, i.e., based on the information of the problem and the results obtained in previous iterations, the binarization scheme that is most likely to obtain the best quality results can be used.

4.1. Q-Learning as a Smart Operator

In the present work, QL is implemented as the intelligent operator of the proposal, which chooses the two-step binarization technique to be used according to a reward system, with which it learns in a deterministic way. The structure of the implemented proposal is exemplified in Figure 3.

4.2. Q-Learning

Within Reinforcement Learning (RL) techniques there are Time Difference (TD) algorithms, which are characterized by exploring the environment and by using this information to update the current state [32]. The TD algorithms show the difference between the current estimation of the value of a state, the discounted value of the next state and the reward. It focuses on state-to-state transitions and learned values of states.
Prominent among the TD algorithms is QL [79], which provides agents with the ability to learn to act in the best way through the consequences of their actions taken. There are different “states” and different possible “actions” where the working “environment” is the current state in which the agent is in and must select and execute an action which affects the “environment” by changing its state. Actions are punished or rewarded by “rewards”, which are judged by the consequence obtained by applying an action in a particular state. Rewards are delayed, allowing the agent to learn from the system. In order to solve the problem, the agent tries to learn the best course of actions that will maximize the accumulated reward. The learning process lies in a set of episodes, where in each episode the agent selects and executes an action in a particular state. For the new episode, the agent’s learning is given by Equation (14):
Q n e w ( s t , a t ) = ( 1 α ) · Q o l d ( s t , a t ) + α · [ r t + γ · m a x Q ( s t + 1 , a t + 1 ) ]
where Q n e w ( s t , a t ) is called Q-value and represents the cumulative quality or reward of the action taken a t in state s t at time t; r t is the reward or punishment received when action a t is taken in state s t at time t; Q o l d is the Q-value for the previous iteration for action a t in state s t ; m a x Q ( s t + 1 , a t + 1 ) is the maximum Q-value of the action for the next state, i.e., the best action the agent can take in the next state; α is the learning factor for which its value must be 0 α 1 ; and γ is the discount factor for which its value must be 0 γ 1 . If α is close to 0, the historical information learned becomes more relevant, whereas if α is close to 1 the information received immediately becomes more relevant. If γ equals 0, only the immediate reward is taken into account, while, as it approaches 1, the future reward receives greater emphasis relative to the immediate reward. The procedure of this algorithm is reflected in Algorithm 1.
Algorithm 1 Q-Learning Algorithm.
1:
Initialize Q v a l u e s , α , γ
2:
whilet < Maximum number of iterations do
3:
    Choose a t base on Q v a l u e s
4:
    Execute action a t and get immediate reward or punishment r t
5:
    Observe the next state s t + 1
6:
     Q n e w ( s t , a t ) = ( 1 α ) · Q o l d · ( s t , a t ) + α · [ r t + γ · m a x Q ( s t + 1 , a t + 1 ) ]
7:
     t t + 1
8:
end while
9:
Return Q n e w ( s t , a t )

4.3. Rewards

The reward in the RL algorithm is fundamental for a correct performance of these same algorithms; that is why, in the literature, there are several methods to calculate the rewards. The type of reward from the chosen metrics determine the value R t for the general QL equation, as detailed in Figure 4.
For the implementation of this work, five forms of rewards were used where two of them are among the simplest in the literature. The first is the one used in [46,54], where it is increased by a fixed value for the action that generated an improvement in the overall fitness of the problem and a decrease in the same fixed value if no improvement was generated. This fixed value is detailed in Equation (15). The second type of reward is a variation of the previous one used in the Q-table, where there is no penalty on the Q-table values, as presented in Equation (16). The last three rewards are collected by Nareyek in [80], which are detailed in Equations (17)–(19), respectively:
w i t h P e n a l t y 1 = + 1 if   there   is   a   fitness   improvement 1 otherwise ,
w i t h O u t P e n t a l t y 1 = + 1 if   there   is   a   fitness   improvement 0 otherwise ,
g l o b a l B e s t = W B e s t F i t n e s s if   there   is   a   fitness   improvement 0 otherwise ,
r o o t A d a p t a t i o n = B e s t F i t n e s s if   there   is   a   fitness   improvement 0 otherwise ,
and
e s c a l a t i n g M u l t i p l i c a t i v e A d a p t a t i o n = W · B e s t F i t n e s s if   there   is   a   fitness   improvement 0 otherwise ,
where W and B e s t F i t n e s s are defined as a constant of value 10 and the best fitness found so far, respectively.

4.4. Actions A t

As mentioned above, the main objective of Q-Learning is to find an optimal policy within a set of actions. Therefore, it is important to define which actions the agent will take during its learning process. In the present work, the actions taken by the agents are the combinations between the transfer functions and the binarization functions extracted from the Two-Step Technique. Thus, we obtain 40 possible actions to be selected in the learning process.

4.5. Obtaining Metrics (getMetric)

The reward or punishment is judged by the consequence obtained by the performance of the action. Therefore, it is important to define what the comparison metrics will be to discriminate the consequence. In the present work, the comparison metric is the fitness obtained in each iteration of the optimization process and it is compared with the best fitness obtained. If fitness improves, action is rewarded, while if fitness worsens, action is punished.

4.6. State Determination (getState)

As QL carries out its learning process through the state transition, it is important to define which states to use and how it will transition between them.
In the present work, two states were defined which refer to the phases of a metaheuristic: exploration and exploitation. These states were not chosen at random since, as mentioned above, the objective of this work is to improve the balance of exploration and exploitation of metaheuristics to obtain better results.
In the literature, different authors [81,82,83,84,85] propose metrics that allow us to quantify the diversity of individuals in population algorithms where Hussain’s Dimensional Diversity [85] stands out.
Let D i v be the diversity of the population at a particular time where Hussain et al. [85] stands out. In order to the calculate D i v , the following equation is used:
D i v = 1 l · n d = 1 l i = 1 n | x ¯ d x i d |
where x ¯ d denotes the mean of the individuals in the dimension d, x i d is the the i-th individual value of the d-th dimension, n is the number of individuals in the population and l is the dimension size of individuals.
One of the methods to estimate exploration and exploitation is the one proposed by Morales-Castañeda et al. in [4] who, based on the quantification of the diversity of a population, proposed a method to estimate exploration and exploitation in terms of percentages. The percentage of exploration (XPL %) and exploitation (XPT %) are calculated as follows:
X P L % = D i v D i v m a x · 100
and
X P T % = | D i v D i v m a x | D i v m a x · 100
where D i v is the determination of the diversity state given by Equation (20) and D i v m a x denotes the maximum value of the diversity state found in the entire optimization problem.
Equations (21) and (22) are generic, so it is possible to use any other metric that calculates the diversity of a population.
Thus, the transition of states will be determined by the following method.
n e x t s t a t e = E x p l o r a t i o n if X P L % X P T % E x p l o i t a t i o n if X P L % < X P T %

5. Instantiated Metaheuristics

In this section we describe the MH to be used, which include the Sine-Cosine Algorithm [86] and the Whale Optimization Algorithm [87]. Both have different implementations for solving combinatorial problems, basing the choice of these MH over others on the no free lunch theorem [17,88], which leaves all MH with the same probability of success until their performance in each specific problem has been demonstrated by experimentation.

5.1. Sine-Cosine Algorithm

To the best of our knowledge, the Sine-Cosine Algorithm (SCA) was first defined by Mirjalili [89] in 2016. It is based on sine and cosine trigonometric functions. As all iteration-based optimization techniques, this one starts with a random population. Let r 1 , r 2 , r 3 and r 4 be the four parameters of the motion equations. Thus, let r 1 be the parameter that determines the direction of the motion relative to the best solution and it is given by the following:
r 1 = a t a T
where T represents the total number of iterations that will be performed, t is the current iteration of the optimization process and a is a constant. In the first iterations, the motion consists in moving away from the best solution (exploration) and, in the last iterations, the motion consists of moving closer to the best solution (exploitation).
Let r 2 be the parameter which defines the magnitude of the motion and it is given by the following:
r 2 = 2 · π · r a n d
where 0 r a n d 1 and r 2 represent the domain of the sine and cosine functions [ 0 , 2 π ] .
Let r 3 be the parameter which integrates the motion randomness and it is given by the following:
r 3 = 2 · r a n d
where 0 r a n d 1 . If r 3 > 1 , the motion will be more stochastic.
Finally, let r 4 be the parameter which determines if the motion will be performed with the sine or cosine function in the same proportion defined as follows:
r 4 = r a n d
where 0 r a n d 1 . Then, the motion equation definition depends on the value of r 4 . Let  X i t + 1 be the i-th component of the general solution in the iteration t + 1 . The value of X i t + 1 depends on the value of the parameter r 4 ; thus, the following is the case:
X i t + 1 = X i t + r 1 · s i n ( r 2 ) · | r 3 · X B e s t t X i t | if r 4 < 0.5 X i t + r 1 · c o s ( r 2 ) · | r 3 · X B e s t t X i t | if r 4 0.5
where X i t denotes the i-th component of the general solution in the iteration t and X B e s t t denotes the Best i-th component of the general solution in the iteration t. The procedure of MH is explained in Algorithm 2.
Algorithm 2 Sine-Cosine Algorithm.
1:
Initialize a set of search agents (Solutions) (X)
2:
while t Maximum number of iterations do
3:
    Evaluate each of the search agents by objective function
4:
    Update the best solution obtained so far ( X B e s t )
5:
    Update r 1 , r 2 , r 3 and r 4
6:
    Update the position of the search agents using the Equation (28)
7:
     t t + 1
8:
end while
9:
Return the best solution obtained ( X B e s t )

5.2. Whale Optimization Algorithm

The Whale Optimization Algorithm (WOA) is inspired by the hunting behavior of humpback whales, specifically, how they make use of a strategy known as “bubble netting”. This strategy consists of locating the prey and, by means of moving in spiral turns that are similar to a “9”, enclosing in on the prey. This algorithm was invented by Mirjalili and Lewis in 2015 [90].
The WOA metaheuristic starts with a set of random solutions. At each iteration, the search agents update their positions with respect to a randomly chosen search agent or the best solution obtained so far. There is a parameter a that is reduced from 2 to 0 to provide changes between exploration and exploitation. When the equation vector (29) has value: A 1 , a new random search agent is chosen. On the other hand, when A < 1 , the best solution is selected; the point of this is to be able to update the position of the search agents.
On the other hand, the value of the parameter p (random number between 0 and 1) allows the algorithm to switch between a spiral or circular motion. In order to assimilate this, there are three movements that are crucial when working with the metaheuristic:
  • Searching for prey ( p < 0.5 and | A | 1 ): The whales search for prey randomly based on the position of each prey. When the algorithm determines that A 1 , then we can say that it is exploring and allows WOA to perform a global search. We represent this first move with the following mathematical model:
    X i t + 1 =   X r a n d t A · D D =   | C · X r a n d t X i t |
    where t denotes the current iteration, A and C are coefficient vectors and X r a n d is a random position vector (i.e., a random whale) chosen from the current population. The vectors A and C can be computed according to the following Equation (30):
    A = 2 a · r a C = 2 · r
    where, a decreases linearly from 2 to 0 over iterations (both in the exploration and exploitation phases) and r corresponds to a random vector of values between [ 0 , 1 ] .
  • Encircling the prey ( p < 0.5 and | A | < 1 ): Once the whales have found and recognized their prey, they begin to encircle them. Since the position of the optimal design in the search space is not known in the first instance, the metaheuristic assumes that the current best solution is the target prey or is close to the optimum. Therefore, once the best search agent is defined, the other agents will attempt to update their positions toward the best search agent. Mathematically, it is modeled in Equation (31):
    X i t + 1 =   X * i t A · D D =   | C · X * i t X i t |
    where X * is the position vector of the best solution obtained so far and X is the position vector. The vector A and C are calculated in Equation (30). It is worth mentioning that X must be updated at each iteration if a better solution exists.
  • Bubble net attack ( p 0.5 ): For this attack, the “shrinking net mechanism” is presented and this behavior is achieved by decreasing the value of a in the Equation (30). Thus, as the whale spirals, it shrinks the bubble net until it finally catches the prey. This motion is modeled with the following Equation (32):
    X i t + 1 =   D · e b l · cos ( 2 π l ) + X * i t D =   | X * i t X i t |
    where D is the distance of the i-th whale from the prey (the best solution obtained so far), b is a constant for defining the shape of the logarithmic spiral and l is a random number between [ 1 , 1 ] .
It is worth mentioning that humpback whales simultaneously swim around the prey within a shrinking circle and along a spiral trajectory. In order to model this simultaneous behavior, there is a 50% probability of choosing between the encircling prey mechanism (2) or the spiral model (3) to update the position of the whales during optimization. The mathematical model is as follows:
X i t + 1 = X * i t A · D If p < 0.5 D · e b l · cos ( 2 π l ) + X * i t If p 0.5
We include the pseudo-code (Algorithm 3) of the metaheuristic [91] for a better understanding of what was previously stated.
Algorithm 3: Whale Optimization Algorithm.
1:
Initialize the whale population X i ( i = 1 , 2 , , n )
2:
Calculate the fitness of each search agent
3:
X * = The best search agent
4:
while t Maximum number of iterations do
5:
    for each search agent do
6:
        Update a , A , C , l and p
7:
        if ( p < 0 , 5 ) then
8:
           if ( | A | < 1 ) then
9:
               Update the position of the current search agent using of Equation (31).
10:
           else( | A | 1 )
11:
               Select a random search agent ( X R a n d )
12:
               Update the position of the current search agent using Equation (29).
13:
           end if
14:
        else( p 0 , 5 )
15:
           update the position of the current search agent using Equation (32).
16:
        end if
17:
    end for
18:
    Check if any search agent goes beyond the search space and we modify it.
19:
    Calculate the fitness of each search agent
20:
    Update X * if there is a better solution
21:
     t t + 1
22:
end while
23:
Return ( X * )

6. Set Covering Problem

SCP is defined as a binary matrix (A), where a i , j 0 , 1 is the value of each cell in the matrix A and i and j are the size m-rows and n-columns, respectively:
A = a 11 a 12 a 1 n a 21 a 22 a 2 n a m 1 a m 2 a m n
Defining the column j satisfies a row i if a i j is equal to 1 and this will be the contrary case if this is 0. In addition, it has an associated cost c C , where C = c 1 , c 2 , , c n together with i { 1 , 2 , , m } and j { 1 , 2 , , n } are the sets of rows and columns, respectively.
The problem results in the following objective: to minimize the cost of the subset S J , with the constraint that all rows i I are covered by at least one column j J . It is taken into consideration that when the column j is in the subset of solution S, this is equal to 1 and 0 otherwise.
The SCP can be defined as the following.
Minimize Z = j = 1 n c j x j Subject to
j = 1 n a i j x j 1 i I
x j { 0 , 1 } j J

7. Experimental Results and Performance Evaluation

In order to determine if the integration of QL as a binary scheme selector improves the results of the MH, five versions of QL have been implemented with different fixes, which have been named as indicated in the Table 1.
The five implementations of QL have been compared in a subset of instances of Set Covering Problem for each metaheuristic against two recommendations of binary schemes presented in the literature, which are presented in Table 2.
Both the code and the results obtained can be reviewed in the github repository.

7.1. Statistical Test

Given the increasing use of MH applied to different combinatorial problems, there is a natural interest in comparing which one performs better because, sometimes, it is not so obvious as to which one is better. In this sense, statistical techniques provide a real alternative to compare results.
In order to determine the difference between the results obtained by different algorithms, it is necessary to use a statistical technique to establish whether the difference exists [92,93]. The most appropriate test to compare our algorithms is the Wilcoxon–Mann–Whitney test. This test is specifically used when two samples are independent and we cannot assume normality of at least one of them. The hypotheses used for this test are as follows:
H 0 : μ A μ B H 1 : μ A < μ B
where μ A and μ B denotes the average value provided by Algorithms A and B, respectively. We assume that if a p-value < 0.05 is obtained, H 0 will be rejected and H 1 will be accepted.

7.2. Experimental Results

In order to demonstrate the results obtained by WOA and SCA, Table 3 and Table 4 are arranged. In the contents of both tables, the best values obtained from each row are highlighted. Experiments solving the SCP with Beasley’s OR-Library instances totaled 45 instances. For both metaheuristics (WOA and SCA), instances were run with a population consisting 40 individuals and 1000 iterations were performed per run. With this, the stopping condition is at 40,000 evaluations of the objective function, as used in [92]. The implementation was developed in Python 3.8.5 and processed using the free Google Colaboratory service [94]. The parameter settings for the SARSA and QL algorithms are as follows: γ = 0.4 and α = 0.1 . These tables are composed of the following: the first column corresponds to the name of the instance, the second column is the optimum known to date, the next four columns (Best, Avg, Sec and RPD) present the best value and the averages obtained from the 31 independent runs, the average time of its executions and, finally, the Relative Percentage Deviation defined in Equation (38). These three columns mentioned above are repeated for all versions (BCL1, MIR2, QL1, QL2, QL3, QL4 andQL5). Finally, the last row is the sum of each column. We can denote that the adaptations and versions of QL are produce effects on the chosen metaheuristics, obtaining better results than their counterparts without QL.
RPD = 100 · B e s t O p t O p t .
In Figure 5, Figure 6, Figure 7 and Figure 8, a comparison of the best results obtained in 31 independent runs of the chosen schemes is presented, showing less dispersion in the results for the versions with QL, which supports the robustness of the proposal compared to fixed binarization schemes, since, in addition to obtaining better results, these vary in smaller magnitude in independent runs.
Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15 and Figure 16, show the exploration and exploitation balance obtained by the algorithm. The x-axis represents the total number of iterations, the y-axis shows the percentage of exploration and exploitation, measured by Equations (20)–(22).
By observing the behavior of the exploration and exploitation percentages in the figures, it can be clearly observed that there are two different behaviors. In the case of Figure 10 and Figure 12, there is a clear beginning of exploration in the first iterations, while the exploitation increases until maintaining high values during the rest of the iterations. This behavior is the one recommended by [4], while for the case of Figure 14 and Figure 16, the exploration percentage remains at high values during all iterations and similar behavior is observed when random searches occur. For the versions with QL, the expected exploration and exploitation behavior is observed, where exploration predominates at the beginning and gradually changes to exploitation. However, in this one, variations in the percentages are observed given by the dynamic change of the binarization scheme in each iteration, which in turn has been reflected in better quality results.
These variations in exploration and exploitation percentages open the discussion about the implications of changing real-time binarization schemes, quantifying these percentage variations and the question of how to assess the quality of these changes.

8. Conclusions

Today, Machine Learning techniques are increasingly used in most areas of research, as data capture has been steadily increasing in recent years. MH have not been the exception, where these ML techniques have supported MH from various approaches in order to improve their performance. This is one of the main motivations for this proposal, since MH generate a large amount of data that is not always used for their operation. In the brief review of QL implementations, it is shown how this technique improves the performance of MH, managing to identify two methods of implementation: (1) as a selector of low-level heuristics in the context of hyperheuristics; (2) as a selector of a certain operator among a set of operators, as is the example of the choice of a mutation operator among different mutation operators. On the other hand, it is noted that the size of the Q-Table tends to take small sizes, since the sets of operators to be selected are usually small compared to other techniques of Machine Learning that address large amounts of input variables. The use of enhancements and combinations of QL with other Temporal Difference techniques such as SARSA (State-Action-Reward-State-Action) is also identified.
The algorithms presented in this work have shown that the choice of binarization schemes affects the balance of exploration and exploitation turning them into balanced metaheuristics [95], used in [96,97,98]. The balance achieved by our method shows an improvement in the quality of solutions, obtaining statistically significant better results.
Under the results obtained, it can be concluded that the versions that have incorporated QL in the selection of binarization schemes obtained variations in the percentages of exploration and exploitation, which are reflected in improvements in the quality of the solutions and in the techniques that do not present these disturbances in their static versions. These perturbations can be associated with better quality movements that will present a greater probability of finding better values when presenting variations in the percentages of exploration and exploitation, meaning that the solutions will have a greater rate of movement in the search space.
On the other hand, when observing the comparison between the average execution times (“Sec” column in Table 3 and Table 4), a great increase in time is observed for the versions that incorporate the binarization scheme selector, which are approximately around 447% in WOA and 223% for SCA; this is justifiable since the incorporation of QL to the iterative process of the MH, means a greater demand of calculation, since in each iteration the decision of which binarization scheme to use must be taken. However, this difference in time when comparing against a single binarization scheme, among the 40 combinations that were already explained in Section 3, produces an unequal comparison. Moreover, by not having a selector of binarization schemes, the 40 combinations of binarization schemes presented in this work should be tested, but since this would involve too much computation time, recommendations presented in the literature are usually used; however, they do not ensure to be the best binarization scheme for the problem and techniques implemented. Consequently, our proposal of a binarization scheme selector is of greater relevance since the high computational costs for the choice of binarization schemes are often not affordable; thus, under this scenario our proposal excels when compared with fixed binarization schemes.
In terms of future work, the option of evaluating other MH with exploration and exploitation behaviors similar to those presented will be contemplated, as well as other more established MH such as Differential Evolution (DE) and Particle Swarm Optimization (PSO) in order to verify that the incorporation of QL generates the same effect on them. We also consider the evaluation of other Temporal Difference techniques to compete with QL, the inclusion of other existing transfer functions in the literature, such as O-Shapes, and the evaluation of other methods of rewarding in addition to the five we have evaluated in the present work.

Author Contributions

B.C.: Conceptualization, funding acquisition, investigation, methodology, project administration, resources, supervision, writing—original draft and writing—review and editing. R.S.: conceptualization, funding acquisition, investigation, methodology, project administration, resources, writing—original draft and Writing—review and editing. J.L.-R.: data curation, investigation, software, visualization, writing—original draft and formal analysis. M.B.-R.: data curation, investigation, software, visualization, writing—original draft, formal analysis. J.M.L.-G.: formal analysis, methodology, validation, writing—review and editing. N.C.: formal analysis, methodology, validation, writing—review and editing. M.C.: data curation, investigation, software, visualization, writing—original draft. D.T.: data curation, investigation, software, visualization, writing—original draft. F.C.-C.: data curation, investigation, software, visualization, writing—original draft. J.G.: formal analysis, funding acquisition, methodology, validation, writing—review and editing. G.A.: formal analysis, methodology, validation, writing—review and editing. C.C.: formal analysis, methodology, validation, writing—review and editing. J.-M.R.: formal analysis, methodology, validation, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by Grant ANID/FONDECYT/REGULAR/1210810.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The code used and replicate the results can be found in: https://github.com/joselemusr/BSS-QL.

Acknowledgments

Broderick Crawford is supported by Grant ANID/FONDECYT/REGULAR/ 1210810: “DATA-DRIVEN AMBIDEXTROUS METAHEURISTICS: USING MACHINE LEARNING APPROACHES TO MANAGE BALANCE OF EXPLORATION AND EXPLOITATION WHEN SOLVING COMBINATORIAL PROBLEMS WITH CONTINUOUS SWARM INTELLIGENCE ALGORITHMS”. Ricardo Soto is supported by Grant ANID/FONDECYT/REGULAR/1190129: “BUILDING REACTIVE LEARNING-BASED HYBRID METAHEURISTICS”. José Lemus-Romani is supported by National Agency for Research and Development (ANID)/Scholarship Program/ DOCTORADO NACIONAL/2019-21191692. Marcelo Becerra-Rozas is supported by National Agency for Research and Development (ANID)/Scholarship Program/DOCTORADO NACIONAL/ 2021-21210740. José García was supported by the Grant ANID/FONDECYT/INICIACION/ 11180056: “APPLYING MACHINE LEARNING TECHNIQUES TO METAHEURISTIC ALGORITHMS TO SOLVE COMBINATORIAL OPTIMIZATION PROBLEMS”.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Talbi, E. Metaheuristics: From Design to Implementation; John Wiley & Sons: Hoboken, NJ, USA, 2009; Volume 74. [Google Scholar]
  2. Xu, J.; Zhang, J. Exploration-exploitation tradeoffs in metaheuristics: Survey and analysis. In Proceedings of the 33rd Chinese Control Conference, Nanjing, China, 28–30 July 2014; pp. 8633–8638. [Google Scholar]
  3. Yang, X.S.; Deb, S.; Fong, S. Metaheuristic algorithms: Optimal balance of intensification and diversification. Appl. Math. Inf. Sci. 2014, 8, 977. [Google Scholar] [CrossRef]
  4. Morales-Castañeda, B.; Zaldivar, D.; Cuevas, E.; Fausto, F.; Rodríguez, A. A better balance in metaheuristic algorithms: Does it exist? Swarm Evol. Comput. 2020, 54, 100671. [Google Scholar] [CrossRef]
  5. Eftimov, T.; Korošec, P. Understanding exploration and exploitation powers of meta-heuristic stochastic optimization algorithms through statistical analysis. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, Prague, Czech Republic, 13–17 July 2019; pp. 21–22. [Google Scholar]
  6. Hussain, A.; Muhammad, Y.S. Trade-off between exploration and exploitation with genetic algorithm using a novel selection operator. Complex Intell. Syst. 2019, 6, 1–14. [Google Scholar] [CrossRef] [Green Version]
  7. Glover, F.; Samorani, M. Intensification, Diversification and Learning in metaheuristic optimization. J. Heuristics 2019, 25, 517–520. [Google Scholar] [CrossRef] [Green Version]
  8. Hussain, K.; Salleh, M.N.M.; Cheng, S.; Shi, Y. On the exploration and exploitation in popular swarm-based metaheuristic algorithms. Neural Comput. Appl. 2019, 31, 7665–7683. [Google Scholar] [CrossRef]
  9. Liu, H.L.; Chen, L.; Deb, K.; Goodman, E.D. Investigating the effect of imbalance between convergence and diversity in evolutionary multiobjective algorithms. IEEE Trans. Evol. Comput. 2016, 21, 408–425. [Google Scholar]
  10. Gendreau, M.; Potvin, J.Y. Handbook of Metaheuristics; International Series in Operations Research & Management Science; Springer: Cham, Switzerland, 2019; Volume 272. [Google Scholar] [CrossRef]
  11. Crawford, B.; Soto, R.; Astorga, G.; Lemus-Romani, J.; Misra, S.; Rubio, J.M. An adaptive intelligent water drops algorithm for set covering problem. In Proceedings of the 2019 19th International Conference on Computational Science and Its Applications (ICCSA), St. Petersburg, Russia, 1–4 July 2019; pp. 39–45. [Google Scholar]
  12. Crawford, B.; Soto, R.; Olivares, R.; Embry, G.; Flores, D.; Palma, W.; Castro, C.; Paredes, F.; Rubio, J.M. A binary monkey search algorithm variation for solving the set covering problem. Nat. Comput. 2019, 19, 825–841. [Google Scholar] [CrossRef]
  13. García, J.; Crawford, B.; Soto, R.; Castro, C.; Paredes, F. A k-means binarization framework applied to multidimensional knapsack problem. Appl. Intell. 2018, 48, 357–380. [Google Scholar] [CrossRef]
  14. Crawford, B.; Soto, R.; Astorga, G.; Lemus, J.; Salas-Fernández, A. Self-configuring Intelligent Water Drops Algorithm for Software Project Scheduling Problem. In International Conference on Information Technology & Systems; Springer: Cham, Switzerland, 2019; pp. 274–283. [Google Scholar]
  15. Crawford, B.; Soto, R.; Astorga, G.; Castro, C.; Paredes, F.; Misra, S.; Rubio, J.M. Solving the software project scheduling problem using intelligent water drops. Teh. Vjesn. 2018, 25, 350–357. [Google Scholar]
  16. Mafarja, M.M.; Mirjalili, S. Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 2017, 260, 302–312. [Google Scholar] [CrossRef]
  17. Ho, Y.C.; Pepyne, D.L. Simple explanation of the no-free-lunch theorem and its implications. J. Optim. Theory Appl. 2002, 115, 549–570. [Google Scholar] [CrossRef]
  18. Crawford, B.; Soto, R.; Astorga, G.; García, J.; Castro, C.; Paredes, F. Putting continuous metaheuristics to work in binary search spaces. Complexity 2017, 2017, 8404231. [Google Scholar] [CrossRef] [Green Version]
  19. García, J.; Moraga, P.; Valenzuela, M.; Crawford, B.; Soto, R.; Pinto, H.; Peña, A.; Altimiras, F.; Astorga, G. A Db-Scan Binarization Algorithm Applied to Matrix Covering Problems. Comput. Intell. Neurosci. 2019, 2019, 3238574. [Google Scholar] [CrossRef] [Green Version]
  20. Maniezzo, V.; Stützle, T.; Voß, S. (Eds.) Matheuristics—Volume 10 of Annals of Information Systems; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
  21. Juan, A.A.; Faulin, J.; Grasman, S.E.; Rabe, M.; Figueira, G. A review of simheuristics: Extending metaheuristics to deal with stochastic combinatorial optimization problems. Oper. Res. Perspect. 2015, 2, 62–72. [Google Scholar] [CrossRef] [Green Version]
  22. Juan, A.A.; Keenan, P.; Martı, R.; McGarraghy, S.; Panadero, J.; Carroll, P.; Oliva, D. A Review of the Role of Heuristics in Stochastic Optimisation: From Metaheuristics to Learnheuristics. Ann. Oper. Res. 2021. [Google Scholar] [CrossRef]
  23. Arnau, Q.; Juan, A.A.; Serra, I. On the use of learnheuristics in vehicle routing optimization problems with dynamic inputs. Algorithms 2018, 11, 208. [Google Scholar] [CrossRef] [Green Version]
  24. Bayliss, C.; Juan, A.A.; Currie, C.S.; Panadero, J. A learnheuristic approach for the team orienteering problem with aerial drone motion constraints. Appl. Soft Comput. 2020, 92, 106280. [Google Scholar] [CrossRef]
  25. Lavecchia, A. Machine-learning approaches in drug discovery: Methods and applications. Drug Discov. Today 2015, 20, 318–331. [Google Scholar] [CrossRef] [Green Version]
  26. Najafabadi, M.M.; Villanustre, F.; Khoshgoftaar, T.M.; Seliya, N.; Wald, R.; Muharemagic, E. Deep learning applications and challenges in big data analytics. J. Big Data 2015, 2, 1. [Google Scholar] [CrossRef] [Green Version]
  27. Kotsiantis, S.B.; Zaharakis, I.; Pintelas, P. Supervised machine learning: A review of classification techniques. Emerg. Artif. Intell. Appl. Comput. Eng. 2007, 160, 3–24. [Google Scholar]
  28. Celebi, M.E.; Aydin, K. Unsupervised Learning Algorithms; Springer: Berlin/Heidelberg, Germany, 2016; Volume 9. [Google Scholar]
  29. Valdivia, S.; Soto, R.; Crawford, B.; Caselli, N.; Paredes, F.; Castro, C.; Olivares, R. Clustering-based binarization methods applied to the crow search algorithm for 0/1 combinatorial problems. Mathematics 2020, 8, 1070. [Google Scholar] [CrossRef]
  30. Lorbeer, B.; Kosareva, A.; Deva, B.; Softić, D.; Ruppel, P.; Küpper, A. Variations on the clustering algorithm BIRCH. Big Data Res. 2018, 11, 44–53. [Google Scholar] [CrossRef]
  31. Chapelle, O.; Scholkopf, B.; Zien, A. Semi-supervised learning (chapelle, o. et al., eds.; 2006) [book reviews]. IEEE Trans. Neural Netw. 2009, 20, 542. [Google Scholar] [CrossRef]
  32. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
  33. Veček, N.; Mernik, M.; Filipič, B.; Črepinšek, M. Parameter tuning with Chess Rating System (CRS-Tuning) for meta-heuristic algorithms. Inf. Sci. 2016, 372, 446–469. [Google Scholar] [CrossRef]
  34. Ries, J.; Beullens, P. A semi-automated design of instance-based fuzzy parameter tuning for metaheuristics based on decision tree induction. J. Oper. Res. Soc. 2015, 66, 782–793. [Google Scholar] [CrossRef] [Green Version]
  35. Deng, Y.; Liu, Y.; Zhou, D. An improved genetic algorithm with initial population strategy for symmetric TSP. Math. Probl. Eng. 2015, 2015, 212794. [Google Scholar] [CrossRef] [Green Version]
  36. García, J.; Crawford, B.; Soto, R.; Astorga, G. A percentile transition ranking algorithm applied to binarization of continuous swarm intelligence metaheuristics. In International Conference on Soft Computing and Data Mining; Springer: Cham, Switzerland, 2018; pp. 3–13. [Google Scholar]
  37. García, J.; Altimiras, F.; Peña, A.; Astorga, G.; Peredo, O. A binary cuckoo search big data algorithm applied to large-scale crew scheduling problems. Complexity 2018, 2018, 8395193. [Google Scholar] [CrossRef]
  38. de León, A.D.; Lalla-Ruiz, E.; Melián-Batista, B.; Moreno-Vega, J.M. A Machine Learning-based system for berth scheduling at bulk terminals. Expert Syst. Appl. 2017, 87, 170–182. [Google Scholar] [CrossRef]
  39. Asta, S.; Özcan, E.; Curtois, T. A tensor based hyper-heuristic for nurse rostering. Knowl. Based Syst. 2016, 98, 185–199. [Google Scholar] [CrossRef] [Green Version]
  40. Martin, S.; Ouelhadj, D.; Beullens, P.; Ozcan, E.; Juan, A.A.; Burke, E.K. A multi-agent based cooperative approach to scheduling and routing. Eur. J. Oper. Res. 2016, 254, 169–178. [Google Scholar] [CrossRef] [Green Version]
  41. Song, H.; Triguero, I.; Özcan, E. A review on the self and dual interactions between machine learning and optimisation. Prog. Artif. Intell. 2019, 8, 143–165. [Google Scholar] [CrossRef] [Green Version]
  42. Zhang, H.; Lu, J. Adaptive evolutionary programming based on reinforcement learning. Inf. Sci. 2008, 178, 971–984. [Google Scholar] [CrossRef]
  43. Dorigo, M.; Birattari, M.; Stutzle, T. Ant colony optimization. IEEE Comput. Intell. Mag. 2006, 1, 28–39. [Google Scholar] [CrossRef] [Green Version]
  44. Gambardella, L.M.; Dorigo, M. Ant-Q: A reinforcement learning approach to the traveling salesman problem. In Machine Learning Proceedings 1995; Elsevier: Amsterdam, The Netherlands, 1995; pp. 252–260. [Google Scholar]
  45. Khamassi, I.; Hammami, M.; Ghédira, K. Ant-q hyper-heuristic approach for solving 2-dimensional cutting stock problem. In Proceedings of the 2011 IEEE Symposium on Swarm Intelligence, Paris, France, 11–15 April 2011; pp. 1–7. [Google Scholar]
  46. Choong, S.S.; Wong, L.P.; Lim, C.P. Automatic design of hyper-heuristic based on reinforcement learning. Inf. Sci. 2018, 436, 89–107. [Google Scholar] [CrossRef]
  47. Mosadegh, H.; Ghomi, S.F.; Süer, G. Stochastic mixed-model assembly line sequencing problem: Mathematical modeling and Q-learning based simulated annealing hyper-heuristics. Eur. J. Oper. Res. 2020, 282, 530–544. [Google Scholar] [CrossRef]
  48. Burke, E.K.; Gendreau, M.; Hyde, M.; Kendall, G.; McCollum, B.; Ochoa, G.; Parkes, A.J.; Petrovic, S. The cross-domain heuristic search challenge—An international research competition. In International Conference on Learning and Intelligent Optimization; Springer: Cham, Switzerland, 2011; pp. 631–634. [Google Scholar]
  49. Li, Z.; Li, S.; Yue, C.; Shang, Z.; Qu, B. Differential evolution based on reinforcement learning with fitness ranking for solving multimodal multiobjective problems. Swarm Evol. Comput. 2019, 49, 234–244. [Google Scholar] [CrossRef]
  50. Alimoradi, M.R.; Kashan, A.H. A league championship algorithm equipped with network structure and backward Q-learning for extracting stock trading rules. Appl. Soft Comput. 2018, 68, 478–493. [Google Scholar] [CrossRef]
  51. Sadhu, A.K.; Konar, A.; Bhattacharjee, T.; Das, S. Synergism of firefly algorithm and Q-learning for robot arm path planning. Swarm Evol. Comput. 2018, 43, 50–68. [Google Scholar] [CrossRef]
  52. Zamli, K.Z.; Din, F.; Ahmed, B.S.; Bures, M. A hybrid Q-learning sine-cosine-based strategy for addressing the combinatorial test suite minimization problem. PLoS ONE 2018, 13, e0195675. [Google Scholar]
  53. Yang, X.S. Firefly Algorithms. Nature-Inspired Optimization Algorithms; Elsevier: Amsterdam, The Netherlands, 2021; pp. 123–139. [Google Scholar]
  54. Xu, Y.; Pi, D. A reinforcement learning-based communication topology in particle swarm optimization. Neural Comput. Appl. 2019, 32, 10007–10032. [Google Scholar] [CrossRef]
  55. Das, P.; Behera, H.; Panigrahi, B. Intelligent-based multi-robot path planning inspired by improved classical Q-learning and improved particle swarm optimization with perturbed velocity. Eng. Sci. Technol. Int. J. 2016, 19, 651–669. [Google Scholar] [CrossRef] [Green Version]
  56. Abed-alguni, B.H. Action-selection method for reinforcement learning based on cuckoo search algorithm. Arab. J. Sci. Eng. 2018, 43, 6771–6785. [Google Scholar] [CrossRef]
  57. Abed-alguni, B.H. Bat Q-learning algorithm. Jordanian J. Comput. Inf. Technol. (JJCIT) 2017, 3, 56–77. [Google Scholar]
  58. Arin, A.; Rabadi, G. Integrating estimation of distribution algorithms versus Q-learning into Meta-RaPS for solving the 0–1 multidimensional knapsack problem. Comput. Ind. Eng. 2017, 112, 706–720. [Google Scholar] [CrossRef]
  59. Mirjalili, S.; Lewis, A. S-shaped versus V-shaped transfer functions for binary particle swarm optimization. Swarm Evol. Comput. 2013, 9, 1–14. [Google Scholar] [CrossRef]
  60. Faris, H.; Mafarja, M.M.; Heidari, A.A.; Aljarah, I.; Ala’M, A.Z.; Mirjalili, S.; Fujita, H. An efficient binary salp swarm algorithm with crossover scheme for feature selection problems. Knowl. Based Syst. 2018, 154, 43–67. [Google Scholar] [CrossRef]
  61. Mafarja, M.; Aljarah, I.; Heidari, A.A.; Faris, H.; Fournier-Viger, P.; Li, X.; Mirjalili, S. Binary dragonfly optimization for feature selection using time-varying transfer functions. Knowl. Based Syst. 2018, 161, 185–204. [Google Scholar] [CrossRef]
  62. Mirjalili, S.; Hashim, S.Z.M. BMOA: Binary magnetic optimization algorithm. Int. J. Mach. Learn. Comput. 2012, 2, 204. [Google Scholar] [CrossRef] [Green Version]
  63. Crawford, B.; Soto, R.; Olivares-Suarez, M.; Palma, W.; Paredes, F.; Olguin, E.; Norero, E. A binary coded firefly algorithm that solves the set covering problem. Rom. J. Inf. Sci. Technol 2014, 17, 252–264. [Google Scholar]
  64. Crawford, B.; Soto, R.; Berríos, N.; Johnson, F.; Paredes, F.; Castro, C.; Norero, E. A binary cat swarm optimization algorithm for the non-unicost set covering problem. Math. Probl. Eng. 2015, 2015, 578541. [Google Scholar] [CrossRef] [Green Version]
  65. Soto, R.; Crawford, B.; Olivares, R.; Barraza, J.; Figueroa, I.; Johnson, F.; Paredes, F.; Olguin, E. Solving the non-unicost set covering problem by using cuckoo search and black hole optimization. Nat. Comput. 2017, 16, 213–229. [Google Scholar] [CrossRef]
  66. Soto, R.; Crawford, B.; Olivares, R.; Taramasco, C.; Figueroa, I.; Gómez, A.; Castro, C.; Paredes, F. Adaptive Black Hole Algorithm for Solving the Set Covering Problem. Math. Probl. Eng. 2018, 2018, 2183214. [Google Scholar] [CrossRef]
  67. Leonard, B.J.; Engelbrecht, A.P.; Cleghorn, C.W. Critical considerations on angle modulated particle swarm optimisers. Swarm Intell. 2015, 9, 291–314. [Google Scholar] [CrossRef]
  68. Zhang, G. Quantum-inspired evolutionary algorithms: A survey and empirical study. J. Heuristics 2011, 17, 303–351. [Google Scholar] [CrossRef]
  69. Saremi, S.; Mirjalili, S.; Lewis, A. How important is a transfer function in discrete heuristic algorithms. Neural Comput. Appl. 2015, 26, 625–640. [Google Scholar] [CrossRef] [Green Version]
  70. Kennedy, J.; Eberhart, R.C. A discrete binary version of the particle swarm algorithm. In Proceedings of the 1997 IEEE International conference on systems, man, and cybernetics. Computational cybernetics and simulation, Orlando, FL, USA, 12–15 October 1997; Volume 5, pp. 4104–4108. [Google Scholar]
  71. Rajalakshmi, N.; Subramanian, D.P.; Thamizhavel, K. Performance enhancement of radial distributed system with distributed generators by reconfiguration using binary firefly algorithm. J. Inst. Eng. (India) Ser. B 2015, 96, 91–99. [Google Scholar] [CrossRef]
  72. Crawford, B.; Soto, R.; Peña, C.; Riquelme-Leiva, M.; Torres-Rojas, C.; Johnson, F.; Paredes, F. Binarization methods for shuffled frog leaping algorithms that solve set covering problems. In Software Engineering in Intelligent Systems; Springer: Cham, Switzerland, 2015; pp. 317–326. [Google Scholar]
  73. Tamayo-Vera, D.; Chen, S.; Bolufé-Röhler, A.; Montgomery, J.; Hendtlass, T. Improved Exploration and Exploitation in Particle Swarm Optimization. In Recent Trends and Future Technology in Applied Intelligence; Mouhoub, M., Sadaoui, S., Ait Mohamed, O., Ali, M., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 421–433. [Google Scholar]
  74. Črepinšek, M.; Liu, S.H.; Mernik, M. Exploration and Exploitation in Evolutionary Algorithms: A Survey. ACM Comput. Surv. 2013, 45. [Google Scholar] [CrossRef]
  75. Olorunda, O.; Engelbrecht, A.P. Measuring exploration/exploitation in particle swarms using swarm diversity. In Proceedings of the 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence, Hong Kong, China, 1–6 June 2008; pp. 1128–1134. [Google Scholar] [CrossRef]
  76. Burke, E.K.; Hyde, M.R.; Kendall, G.; Ochoa, G.; Özcan, E.; Woodward, J.R. A Classification of Hyper-Heuristic Approaches: Revisited. In Handbook of Metaheuristics; Gendreau, M., Potvin, J.Y., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 453–477. [Google Scholar] [CrossRef]
  77. Oyebolu, F.B.; Allmendinger, R.; Farid, S.S.; Branke, J. Dynamic scheduling of multi-product continuous biopharmaceutical facilities: A hyper-heuristic framework. Comput. Chem. Eng. 2019, 125, 71–88. [Google Scholar] [CrossRef] [Green Version]
  78. Leng, L.; Zhao, Y.; Wang, Z.; Zhang, J.; Wang, W.; Zhang, C. A Novel Hyper-Heuristic for the Biobjective Regional Low-Carbon Location-Routing Problem with Multiple Constraints. Sustainability 2019, 11, 1596. [Google Scholar] [CrossRef] [Green Version]
  79. Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
  80. Nareyek, A. Choosing search heuristics by non-stationary reinforcement learning. In Metaheuristics: Computer Decision-Making; Springer: Boston, MA, USA, 2003; pp. 523–544. [Google Scholar]
  81. Salleh, M.N.M.; Hussain, K.; Cheng, S.; Shi, Y.; Muhammad, A.; Ullah, G.; Naseem, R. Exploration and exploitation measurement in swarm-based metaheuristic algorithms: An empirical analysis. In International Conference on Soft Computing and Data Mining; Springer: Cham, Switzerland, 2018; pp. 24–32. [Google Scholar]
  82. Cheng, S.; Shi, Y.; Qin, Q.; Zhang, Q.; Bai, R. Population Diversity Maintenance In Brain Storm Optimization Algorithm. J. Artif. Intell. Soft Comput. Res. 2014, 4, 83–97. [Google Scholar] [CrossRef] [Green Version]
  83. Mattiussi, C.; Waibel, M.; Floreano, D. Measures of diversity for populations and distances between individuals with highly reorganizable genomes. Evol. Comput. 2004, 12, 495–515. [Google Scholar] [CrossRef] [Green Version]
  84. Lynn, N.; Suganthan, P.N. Heterogeneous comprehensive learning particle swarm optimization with enhanced exploration and exploitation. Swarm Evol. Comput. 2015, 24, 11–24. [Google Scholar] [CrossRef]
  85. Hussain, K.; Zhu, W.; Salleh, M.N.M. Long-term memory Harris’ hawk optimization for high dimensional and optimal power flow problems. IEEE Access 2019, 7, 147596–147616. [Google Scholar] [CrossRef]
  86. Abualigah, L.; Diabat, A. Advances in Sine Cosine Algorithm: A comprehensive survey. Artif. Intell. Rev. 2021, 54, 2567–2608. [Google Scholar] [CrossRef]
  87. Mirjalili, S.; Mirjalili, S.M.; Saremi, S.; Mirjalili, S. Whale optimization algorithm: Theory, literature review, and application in designing photonic crystal filters. Nat. Inspired Optim. 2020, 219–238. [Google Scholar] [CrossRef]
  88. Wolpert, D.H.; Macready, W.G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1997, 1, 67–82. [Google Scholar] [CrossRef] [Green Version]
  89. Mirjalili, S. SCA: A sine cosine algorithm for solving optimization problems. Knowl. Based Syst. 2016, 96, 120–133. [Google Scholar] [CrossRef]
  90. Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
  91. Hassan, A.A.; Abdullah, S.; Zamli, K.Z.; Razali, R. Combinatorial Test Suites Generation Strategy Utilizing the Whale Optimization Algorithm. IEEE Access 2020, 9, 192288–192303. [Google Scholar] [CrossRef]
  92. Lanza-Gutierrez, J.M.; Crawford, B.; Soto, R.; Berrios, N.; Gomez-Pulido, J.A.; Paredes, F. Analyzing the effects of binarization techniques when solving the set covering problem through swarm optimization. Expert Syst. Appl. 2017, 70, 67–82. [Google Scholar] [CrossRef]
  93. Osaba, E.; Carballedo, R.; Diaz, F.; Onieva, E.; Masegosa, A.D.; Perallos, A. Good practice proposal for the implementation, presentation, and comparison of metaheuristics for solving routing problems. Neurocomputing 2018, 271, 2–8. [Google Scholar] [CrossRef] [Green Version]
  94. Bisong, E. Google colaboratory. In Building Machine Learning and Deep Learning Models on Google Cloud Platform; Springer: Berlin/Heidelberg, Germany, 2019; pp. 59–64. [Google Scholar]
  95. Crawford, B.; León de la Barra, C. Los Algoritmos Ambidiestros. 2020. Available online: https://www.mercuriovalpo.cl/impresa/2020/07/13/full/cuerpo-principal/15/ (accessed on 12 February 2021).
  96. Lemus-Romani, J.; Crawford, B.; Soto, R.; Astorga, G.; Misra, S.; Crawford, K.; Foschino, G.; Salas-Fernández, A.; Paredes, F. Ambidextrous Socio-Cultural Algorithms. In International Conference on Computational Science and Its Applications; Springer: Cham, Switzerland, 2020; pp. 923–938. [Google Scholar]
  97. Cisternas-Caneo, F.; Crawford, B.; Soto, R.; de la Fuente-Mella, H.; Tapia, D.; Lemus-Romani, J.; Castillo, M.; Becerra-Rozas, M.; Paredes, F.; Misra, S. A Data-Driven Dynamic Discretization Framework to Solve Combinatorial Problems Using Continuous Metaheuristics. In Innovations in Bio-Inspired Computing and Applications; Springer International Publishing: Cham, Switzerland, 2021; pp. 76–85. [Google Scholar]
  98. Tapia, D.; Crawford, B.; Soto, R.; Cisternas-Caneo, F.; Lemus-Romani, J.; Castillo, M.; García, J.; Palma, W.; Paredes, F.; Misra, S. A Q-Learning Hyperheuristic Binarization Framework to Balance Exploration and Exploitation. In International Conference on Applied Informatics; Springer: Cham, Switzerland, 2020; pp. 14–28. [Google Scholar]
Figure 1. Two-step Binarization Scheme.
Figure 1. Two-step Binarization Scheme.
Mathematics 09 01839 g001
Figure 2. S-shape (a) and V-shape (b) transfer functions.
Figure 2. S-shape (a) and V-shape (b) transfer functions.
Mathematics 09 01839 g002
Figure 3. Proposed structure for binarization scheme with Q-Learning.
Figure 3. Proposed structure for binarization scheme with Q-Learning.
Mathematics 09 01839 g003
Figure 4. Q-Learning scheme for different rewards.
Figure 4. Q-Learning scheme for different rewards.
Mathematics 09 01839 g004
Figure 5. SCA Violin chart of instance 53.
Figure 5. SCA Violin chart of instance 53.
Mathematics 09 01839 g005
Figure 6. SCA Violin chart of instance c5.
Figure 6. SCA Violin chart of instance c5.
Mathematics 09 01839 g006
Figure 7. WOA Violin chart of instance 53.
Figure 7. WOA Violin chart of instance 53.
Mathematics 09 01839 g007
Figure 8. WOA Violin chart of instance c5.
Figure 8. WOA Violin chart of instance c5.
Mathematics 09 01839 g008
Figure 9. SCA—Exploration and Exploitation Graphic of instance 53 version QL5.
Figure 9. SCA—Exploration and Exploitation Graphic of instance 53 version QL5.
Mathematics 09 01839 g009
Figure 10. SCA—Exploration and Exploitation Graphic of instance 53 version BCL1.
Figure 10. SCA—Exploration and Exploitation Graphic of instance 53 version BCL1.
Mathematics 09 01839 g010
Figure 11. SCA—Exploration and Exploitation Graphic of instance c5 version QL5.
Figure 11. SCA—Exploration and Exploitation Graphic of instance c5 version QL5.
Mathematics 09 01839 g011
Figure 12. SCA—Exploration and Exploitation Graphic of instance c5 version BCL1.
Figure 12. SCA—Exploration and Exploitation Graphic of instance c5 version BCL1.
Mathematics 09 01839 g012
Figure 13. WOA—Exploration and Exploitation Graphic of 58 version QL5.
Figure 13. WOA—Exploration and Exploitation Graphic of 58 version QL5.
Mathematics 09 01839 g013
Figure 14. WOA—Exploration and Exploitation Graphic of 58 version MIR2.
Figure 14. WOA—Exploration and Exploitation Graphic of 58 version MIR2.
Mathematics 09 01839 g014
Figure 15. WOA—Exploration and Exploitation Graphic of instance d4 version QL5.
Figure 15. WOA—Exploration and Exploitation Graphic of instance d4 version QL5.
Mathematics 09 01839 g015
Figure 16. WOA—Exploration and Exploitation Graphic of instance d4 version MIR2.
Figure 16. WOA—Exploration and Exploitation Graphic of instance d4 version MIR2.
Mathematics 09 01839 g016
Table 1. Q-Learning Implementation Name.
Table 1. Q-Learning Implementation Name.
Reward TypeName
withPenalty1 (Equation (15))QL1
withOutPenalty1 (Equation (16))QL2
globalBest (Equation (17))QL3
rootAdaptation (Equation (18))QL4
escalatingMultiplicativeAdaptation (Equation (19))QL5
Table 2. Recommended binarization schemes in the literature.
Table 2. Recommended binarization schemes in the literature.
CiteBinarizationTransfer FunctionName
[92]Elitist (Equation (12))V4 (Equation (8))BCL1
[59]Complement (Equation (10))V4 (Equation (8))MIR2
Table 3. SCA Results.
Table 3. SCA Results.
Inst.Opt.BCLMIRQL1QL2QL3QL4QL5
BestAvgSecRPDBestAvgSecRPDBestAvgSecRPDBestAvgSecRPDBestAvgSecRPDBestAvgSecRPDBestAvgSecRPD
41429557580.0292.029.84545734.482883.027.04533538.02397.024.24530537.832414.023.54534537.52411.024.48533536.832455.024.24530535.172466.023.54
42512573605.78300.011.91550725.1863.07.42548552.891929.07.03537551.111860.04.88547552.671866.06.84552556.892012.07.81537552.371928.04.88
43516557598.83306.07.95559766.84994.08.33548552.671839.06.2543554.441886.05.23540555.01793.04.65535550.221864.03.68536547.051907.03.88
44494533557.06304.07.89547688.481127.010.73519531.222104.05.06530533.782045.07.29518531.332046.04.86512532.562111.03.64511532.842056.03.44
45512563591.5294.09.96565751.351030.010.35540549.221904.05.47537551.672065.04.88544552.562057.06.25541552.671959.05.66542549.792126.05.86
46560594635.22270.06.07591840.42793.05.54578587.331923.03.21577589.891890.03.04577588.221890.03.04584592.561873.04.29568589.31819.01.43
47430449483.44312.04.42456586.971829.06.05440448.112308.02.33442448.252242.02.79447450.52383.03.95447452.882185.03.95439451.952236.02.09
48492515565.67322.04.67518727.231052.05.28507514.61990.03.05512516.02011.04.07509513.02086.03.46508515.831836.03.25507515.041948.03.05
49641713759.75319.011.23698964.68918.08.89689695.672054.07.49696700.831882.08.58692696.831877.07.96688694.831907.07.33684698.481850.06.71
410514557580.0292.08.37545734.482883.06.03533538.02397.03.7530537.832414.03.11534537.52411.03.89533536.832455.03.7530535.172466.03.11
51253289303.33297.014.23282396.551561.011.46276281.02586.09.09279282.172022.010.28277282.52274.09.49278282.332099.09.88274282.392064.08.3
52302346366.92297.014.57335486.871210.010.93333334.51618.010.26334336.01578.010.6332336.51577.09.93328334.51582.08.61329336.281604.08.94
53226246258.17265.08.85238331.741067.05.31233235.51822.03.1231235.672010.02.21235236.171830.03.98236236.832015.04.42230236.131857.01.77
54242257276.5297.06.2253338.841183.04.55255256.01860.05.37253255.671870.04.55253255.172166.04.55252256.171943.04.13251254.961915.03.72
55211227237.92324.07.58226289.032449.07.11216221.02329.02.37218221.02218.03.32218222.332338.03.32221222.02388.04.74217221.092240.02.84
56213244258.58307.014.55234324.771725.09.86223230.672023.04.69221230.172076.03.76231232.82380.08.45228233.22089.07.04224231.22060.05.16
57293323342.75306.010.24313427.11318.06.83317319.61878.08.19310314.42081.05.8313317.332085.06.83317321.171905.08.19314317.841918.07.17
58288320333.3302.011.11302444.35773.04.86298299.331873.03.47300301.81952.04.17298301.831718.03.47298300.01769.03.47297301.391754.03.12
59279312326.92335.011.83298414.261156.06.81290293.671779.03.94291294.41757.04.3289293.51796.03.58292293.671781.04.66286293.171818.02.51
510265289303.33297.09.06282396.551561.06.42276281.02586.04.15279282.172022.05.28277282.52274.04.53278282.332099.04.91274282.392064.03.4
61138152165.2283.010.14348369.8189.0152.17141145.771407.02.17144148.16372.04.35144148.42377.04.35146148.29389.05.8146148.65385.05.8
62146170196.17226.016.44161484.97202.010.27157159.83313.07.53158159.83310.08.22157159.17307.07.53155158.0407.06.16154158.26361.05.48
63145156179.75257.07.59151436.71212.04.14149151.33290.02.76150151.67324.03.45151151.83312.04.14150151.67310.03.45149151.65305.02.76
64131139155.25216.06.11137303.0193.04.58135136.33423.03.05136136.17474.03.82134135.67414.02.29135136.2590.03.05134136.52416.02.29
65161193215.25251.019.88185450.06255.014.91177183.17521.09.94178183.67416.010.56182183.67354.013.04179183.17377.011.18175182.96403.08.7
a1253286302.8411.013.04272596.8899.07.51262267.136795.03.56266269.421862.05.14263268.811950.03.95263268.92025.03.95265269.681876.04.74
a2252289304.2489.014.68281577.52463.011.51271273.832045.07.54272273.672567.07.94273275.02223.08.33271272.832315.07.54270274.332096.07.14
a3232266283.44423.014.66250555.52390.07.76245248.62085.05.6246249.02010.06.03249252.02067.07.33251251.331998.08.19242248.742264.04.31
a4234271289.3458.015.81256544.71519.09.4250253.01845.06.84248253.61823.05.98249252.21832.06.41250252.21832.06.84247253.51995.05.56
a5236266286.86462.012.71253513.9796.07.2249250.671973.05.51248252.831962.05.08245250.332005.03.81247251.51995.04.66248251.221934.05.08
b16981108.6400.017.39527585.0364.0663.777071.741652.01.457172.68484.02.97272.9456.04.357272.87439.04.357273.03420.04.35
b27693110.33449.022.3781529.32383.06.587880.33430.02.638082.0490.05.268081.5484.05.267880.67468.02.637881.39414.02.63
b38090117.08426.012.584687.06371.05.08283.33517.02.58383.83432.03.758283.67443.02.58284.0580.02.58183.35497.01.25
b47996116.42445.021.5284582.87445.06.338384.0524.05.068384.83433.05.068284.33457.03.88384.83473.05.068485.26469.06.33
b57283104.09451.015.2875573.1356.04.177475.0469.02.787575.33385.04.177474.83436.02.787475.0475.02.787474.78419.02.78
c1227269302.8751.018.5254536.62061.011.89240245.5510072.05.73241251.322096.06.17244251.032120.07.49239251.02044.05.29244250.942257.07.49
c2219264284.0794.020.55243715.521153.010.96235242.51935.07.31241244.171867.010.05241243.331857.010.05236241.83034.07.76236242.171872.07.76
c3243273306.25727.012.35265745.741659.09.05261263.172145.07.41259263.172352.06.58260262.81703.07.0259262.832088.06.58256263.871635.05.35
c4219251280.67735.014.61235669.421326.07.31236237.02187.07.76233235.831830.06.39236237.831824.07.76234236.672455.06.85234237.132049.06.85
c5215239271.17693.011.16232569.451992.07.91228232.332266.06.05233234.332507.08.37232233.832181.07.91227231.832197.05.58226233.622276.05.12
d1608993.2625.048.3367701.4692.011.676264.422312.03.336466.0673.06.676466.0659.06.676465.81686.06.676366.06643.05.0
d26681105.83625.022.7369802.45698.04.556970.33729.04.556969.5716.04.556969.8680.04.556970.0630.04.556870.13682.03.03
d37281109.75671.012.579845.29739.09.727878.67690.08.337879.0797.08.337879.0678.08.337879.33705.08.337879.09673.08.33
d4626891.42639.09.6864675.35840.03.236364.0620.01.616364.5632.01.616363.83628.01.616464.4744.03.236364.3612.01.61
d56177101.0630.026.2367767.1592.09.846566.6669.06.566667.0615.08.26466.17733.04.926565.83745.06.566466.35629.04.92
284.16307.68412.7813.94290.16581.971025.8726.03269.16273.081913.625.55269.67273.921527.26.01277273.861520.846.08269.6273.891562.845.94267.36273.581503.965.1
Table 4. WOA Results.
Table 4. WOA Results.
Inst.Opt.BCLMIRQL1QL2QL3QL4QL5
BestAvgSecRPDBestAvgSecRPDBestAvgSecRPDBestAvgSecRPDBestAvgSecRPDBestAvgSecRPDBestAvgSecRPD
41429543582.82186.026.57664751.742869.054.78521529.173063.021.45530532.43094.023.54524530.03084.022.14530532.53230.023.54526531.643088.022.61
42512554581.72195.08.2699762.29668.036.52543548.672503.06.05538546.442454.05.08543548.02453.06.05534544.442577.04.3524547.02508.02.34
43516565597.22207.09.5717798.68898.038.95539546.892518.04.46537543.782620.04.07533540.332476.03.29535540.782626.03.68536544.112610.03.88
44494541559.89192.09.51635694.421084.028.54513522.892729.03.85519526.332788.05.06516524.112654.04.45513524.222804.03.85517525.552854.04.66
45512565591.0203.010.35700773.87894.036.72535541.432690.04.49537541.892728.04.88540545.782801.05.47537544.782801.04.88531545.02577.03.71
46560593626.22205.05.89745874.68670.033.04579584.442602.03.39573580.332635.02.32577583.112520.03.04577584.782410.03.04573582.352526.02.32
47430455482.17194.05.81540613.321733.025.58444446.292831.03.26440445.292916.02.33444447.142825.03.26438444.672835.01.86438445.02737.01.86
48492536566.67190.08.94732779.1886.048.78505509.52724.02.64505507.832596.02.64506510.172516.02.85505509.02411.02.64504508.912507.02.44
49641717751.42205.011.869461013.35720.047.58680689.02633.06.08686690.82552.07.02680684.252775.06.08680690.02460.06.08672689.042631.04.84
410514543582.82186.05.64664751.742869.029.18521529.173063.01.36530532.43094.03.11524530.03084.01.95530532.53230.03.11526531.643088.02.33
51253288298.33209.013.83369416.771423.045.85276277.02766.09.09277278.332825.09.49276279.332798.09.09274278.02731.08.3273277.482738.07.91
52302346368.33219.014.57456521.031075.050.99329332.832248.08.94326332.172366.07.95330332.832454.09.27327331.332340.08.28325331.962347.07.62
53226240251.42201.06.19323351.81878.042.92232233.672380.02.65232233.52385.02.65233234.52453.03.1231233.672459.02.21231233.962458.02.21
54242267275.67208.010.33330362.45947.036.36251252.672588.03.72250252.52824.03.31246250.02503.01.65250251.52712.03.31249252.352536.02.89
55211223236.92173.05.69274294.92347.029.86217218.332933.02.84216218.832788.02.37218219.332712.03.32217218.332955.02.84215218.222816.01.9
56213237255.08196.011.27311343.971626.046.01224228.332528.05.16227229.02725.06.57225227.02630.05.63228229.52681.07.04223228.042652.04.69
57293330337.75182.012.63403450.941207.037.54306311.832541.04.44311313.22481.06.14307310.22479.04.78303311.02530.03.41307311.812569.04.78
58288306328.43201.06.25408445.03705.041.67298298.52346.03.47298299.332444.03.47297298.02413.03.12295297.832592.02.43294297.872440.02.08
59279307322.82197.010.04403443.06951.044.44287289.82445.02.87284287.42557.01.79284289.172326.01.79287290.52480.02.87284289.572440.01.79
510265288298.33209.08.68369416.771423.039.25276277.02766.04.15277278.332825.04.53276279.332798.04.15274278.02731.03.4273277.482738.03.02
61138161170.4177.016.67336368.0188.0143.48143147.231558.03.62144146.68747.04.35144146.74809.04.35142146.39781.02.9143146.61795.03.62
62146164193.55181.012.33415506.68177.0184.25155156.17698.06.16154155.83634.05.48152156.0694.04.11156157.33712.06.85154156.65655.05.48
63145172194.5194.018.62390474.71157.0168.97149150.33722.02.76149150.4677.02.76148149.17646.02.07149150.33659.02.76147149.96661.01.38
64131136151.0221.03.82262318.9156.0100.0134134.83787.02.29132134.17874.00.76134134.67785.02.29131134.5807.00.0133135.04827.01.53
65161188209.17215.016.77379514.0170.0135.4178181.83807.010.56180181.5757.011.8176179.5857.09.32177179.17725.09.94175180.0733.08.7
a1253284300.8351.012.25583626.6343.0130.43261268.386741.03.16263266.843320.03.95264266.973309.04.35264266.873422.04.35264267.223308.04.35
a2252284306.12329.012.7553615.9285.0119.44271271.673349.07.54266269.833765.05.56265270.43505.05.16269271.03516.06.75266270.833430.05.56
a3232276284.75343.018.97505568.9299.0117.67242246.53323.04.31244245.63274.05.17242246.03198.04.31243245.53527.04.74240246.173172.03.45
a4234282308.67328.020.51518568.48308.0121.37245249.03125.04.7251251.83061.07.26246246.63152.05.13249250.03003.06.41244249.043174.04.27
a5236262283.88395.011.02531570.32288.0125.0246247.53430.04.24242247.333301.02.54241248.173489.02.12246248.173555.04.24243248.743245.02.97
b16990104.2316.030.43549592.4312.0695.657171.551581.02.97071.68859.01.457071.87866.01.456971.68903.00.07171.65955.02.9
b27694118.25359.023.68487587.03297.0540.797980.0985.03.957879.5883.02.637879.17907.02.637879.51003.02.637879.87915.02.63
b380110134.17360.037.5662766.94323.0727.58282.671120.02.58282.17996.02.58282.67962.02.58282.01049.02.58182.26934.01.25
b479101123.92338.027.85617683.74309.0681.018383.83996.05.068383.83907.05.068383.5932.05.068384.0909.05.068383.87986.05.06
b57282116.42334.013.89521603.65304.0623.617373.831010.01.397374.331046.01.397474.5913.02.787374.33929.01.397374.18962.01.39
c1227266280.4538.017.18707732.6447.0211.45243248.279112.07.05243247.814407.07.05241247.484305.06.17241247.294314.06.17243247.754535.07.05
c2219264280.5586.020.55703799.94455.0221.0236239.834657.07.76234238.834784.06.85238240.174071.08.68238239.64784.08.68232239.814285.05.94
c3243287322.2562.018.11798930.16445.0228.4255259.673760.04.94258260.833878.06.17261261.83996.07.41258261.333851.06.17256260.614010.05.35
c4219261283.58504.019.18721788.58431.0229.22232233.834007.05.94232233.833846.05.94228234.174033.04.11230233.53929.05.02229233.093927.04.57
c5215262288.83550.021.86692765.71460.0221.86227231.04063.05.58229231.334320.06.51229231.03952.06.51223228.674363.03.72226231.04326.05.12
d16099135.4548.065.0781869.4472.01201.676264.612206.03.336364.971297.05.06465.061295.06.676464.791233.06.676465.131263.06.67
d26684119.58553.027.27902988.87475.01266.676969.01481.04.556869.01355.03.036768.751310.01.526868.831341.03.036869.041325.03.03
d37293139.58600.029.179071082.39501.01159.727778.331344.06.947677.331430.05.567677.331421.05.567677.331469.05.567777.71463.06.94
d46278128.5541.025.81760880.65486.01125.816363.671281.01.616263.41336.00.06363.81386.01.616363.671403.01.616263.431389.00.0
d56187115.4520.042.62777877.1483.01173.776465.171297.04.926364.331368.03.286365.01363.03.286565.671374.06.566365.31354.03.28
286.91310.86308.9117.01572.09643.15765.42276.64267.02270.362585.274.94267.38270.292373.764.9266.842772331.334.75266.71270.22381.244.77265.24270.312344.24.27
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Crawford, B.; Soto, R.; Lemus-Romani, J.; Becerra-Rozas, M.; Lanza-Gutiérrez, J.M.; Caballé, N.; Castillo, M.; Tapia, D.; Cisternas-Caneo, F.; García, J.; et al. Q-Learnheuristics: Towards Data-Driven Balanced Metaheuristics. Mathematics 2021, 9, 1839. https://doi.org/10.3390/math9161839

AMA Style

Crawford B, Soto R, Lemus-Romani J, Becerra-Rozas M, Lanza-Gutiérrez JM, Caballé N, Castillo M, Tapia D, Cisternas-Caneo F, García J, et al. Q-Learnheuristics: Towards Data-Driven Balanced Metaheuristics. Mathematics. 2021; 9(16):1839. https://doi.org/10.3390/math9161839

Chicago/Turabian Style

Crawford, Broderick, Ricardo Soto, José Lemus-Romani, Marcelo Becerra-Rozas, José M. Lanza-Gutiérrez, Nuria Caballé, Mauricio Castillo, Diego Tapia, Felipe Cisternas-Caneo, José García, and et al. 2021. "Q-Learnheuristics: Towards Data-Driven Balanced Metaheuristics" Mathematics 9, no. 16: 1839. https://doi.org/10.3390/math9161839

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop