Abstract
The iteration times and learning efficiency of kernel incremental extreme learning machines are always affected by the redundant nodes. A hybrid deep kernel incremental extreme learning machine (DKIELM) based on the improved coyote and beetle swarm optimization methods was proposed in this paper. A hybrid intelligent optimization algorithm based on the improved coyote optimization algorithm (ICOA) and improved beetle swarm optimization algorithm (IBSOA) was proposed to optimize the parameters and determine the number of effectively hidden layer neurons for the proposed DKIELM. A Gaussian global best-growing operator was adopted to replace the original growing operator in the intelligent optimization algorithm to improve COA searching efficiency and convergence. In the meantime, IBSOA was designed based on tent mapping inverse learning and dynamic mutation strategies to avoid falling into a local optimum. The experimental results demonstrated the feasibility and effectiveness of the proposed DKIELM with encouraging performances compared with other ELMs.
Similar content being viewed by others
Introduction
Since proposed by Huang et al., extreme learning machine (ELM) has shown fast training speed and incomparable classification ability in the fields of machine learning and neural networks during the past decades [1, 2]. From the perspective of network structure, ELMs are single-hidden layer feedforward networks (SLFNs). ELMs are an efficient solution for SLFNs to address drawbacks such as learning rate, training epochs, and local minima [3] as they do not require any iterative techniques [4, 5].
The hidden nodes, input weights, and hidden bias of ELM are all generated randomly. Meanwhile, the output weights of neurons are calculated using the conventional Moore–Penrose inverse based on the least square method. Therefore, the parameter tuning process is not required to generate the network model before training, which is entirely irrelevant to the training data. ELM has incomparable benefits compared to the other machine learning methods in terms of training speed, global optimality, and generalization performance.
Although the effectiveness of ELM has been proved in some specific applications, its training data are added one by one or chunk by chunk. Therefore, a wide range of improved ELMs was developed. The incremental ELM (I-ELM) proposed by Huang et al. [6] could calculate the output weights for the ever-increasing hidden nodes more accurately, but it remained a universal approximate approach. The adaptive growth ELM (AG-ELM) proposed by Zhang et al. [7], which consisted of the hidden nodes, the incremental renewal of the network weights, and the sequential generation of the grouping new networks, could approximate the nonlinearity functions effectively. ELM can also be transformed into an online version based on the online sequence analysis [8]. As for generalization, the kernel ELM (KELM) based on the kernel function was proposed in [9] for classification problems. A sequential online ELM based on KELM is proposed in [10] for online learning. Through optimizing the input weights, Sun et al. [11] proposed a differential evolutionary ELM under the compact network structure construction to improve the generalization performance. However, the current version of I-ELM still faces redundant nodes that continually increase the network structure complexity and reduce the learning efficiency. The convergence rate and the prediction ability are still low due to the unfavorable number of hidden layer nodes and the sensitiveness to the new data.
Training speed and learning accuracy crucial for ELM depend heavily on the parameter optimization process. Therefore, researchers proposed several intelligent optimization methods to compute ELM parameters more precisely. Improving the training speed and learning accuracy of ELMs based on bionics methods is becoming a research focus. A differential evolution algorithm is utilized in [12] to optimize the ELM input parameters. An adaptive differential evolution algorithm is given in [13] to optimize the hidden layer node parameters. The MP generalized inverse method was utilized to calculate the output weights. An improved particle swarm optimization algorithm is proposed in [14] to optimize the hidden layer node parameters.
Although the traditional differential evolution optimization method has strong global optimization ability, it could not avoid the premature problem. Traditional particle swarm optimization algorithm has low searching speed despite its local optimization merits. Therefore, a hybrid optimization approach called DEPSO based on differential evolution algorithm and the particle swarm optimization method is given in [15] to optimize the hidden layer nodes. A novel hybrid optimization method given in [16] took advantage of the global search ability of the differential evolution algorithm and the local search capability of the multi-group gray wolf optimization algorithm. However, there are still problems of local optimum and poor population diversity in later iterations.
As the extension of the research in [16], a Hybrid Intelligent Deep Kernel Incremental Extreme Learning Machine (HI-DKIELM) based on the improved coyote optimization algorithm (ICOA) and improved beetle swarm optimization algorithm (IBSOA) is proposed in this paper. First, COA was improved based on the Gaussian global best-growing operator to enhance the searching efficiency and convergence speed. Second, the tent mapping inverse learning and dynamic mutation strategies were adopted to prevent BSOA from local optimum and modify the population diversity in later iterations. Finally, the hybrid intelligent optimization method was adopted to modify the parameters of a deep kernel incremental extreme learning machine (DKIELM) and improve the training speed and classification accuracy. This paper is the first proposal of the HCOBSO strategy as a new perspective.
The significant contributions of this paper are summarized as follows:
-
1.
An improved Coyote Optimization Algorithm is proposed based on the Gaussian global best-growing operator to improve the searching efficiency and convergence. In this paper, we introduce the Gaussian global best-growing operator to replace the original growing operator in the COA method to enhance the searching efficiency and convergence speed.
-
2.
An improved Beetle Swarm Optimization Method was designed based on tent mapping inverse learning and dynamic mutation strategies to prevent BSOA from falling into a local optimum. Due to the higher solving speed and accuracy, BSOA has been used in signal positioning and data classification successfully. However, the original BSOA has several drawbacks. First, it falls into the local optimum easily when solving complex optimization problems. Second, the computational complexity is very high when updating the parameters. Third, in multi-dimension function optimization, relying on a single individual search alone increases the possibility of the optimization algorithm running into the local optimum. Therefore, BSOA was improved in this paper using tent mapping inverse learning and dynamic mutation strategies.
-
3.
The novel HCOBSO method was proposed for optimizing the parameters of DKIELM. The proposed hybrid intelligent optimization method took advantage of the global search ability of COA and the local search capability of BSOA.
The remainder of this paper is organized as follows: the related works are briefly reviewed in section “Preliminary”. Section “The proposed HCOBSO method” presents the HCOBSO algorithm. Section “The proposed HI-DKIELM” presents details of the DKIELM. The experiment result is presented in section “Results and discussion”. Section “Conclusion” concludes our work and outlines our future work.
Preliminary
In this part, the notation of Kernel Incremental extreme learning machine (KI-ELM), the Gaussian global best-growing operator, the tent mapping inverse learning, and the dynamic mutation strategies was provided for the convenience of understanding the proposed ELM algorithm.
Kernel incremental extreme learning machine
Suppose training set \({\left\{{x}_{i},{t}_{i}\right\}}_{i=1}^{N}\) is composed of \(N\) training samples, the input \({x}_{i}\) has \(d\) dimensions, and \({t}_{i}\) is the label of the output, then the output of the ELM is [17, 18]
The kernel matrix of KI-ELM can be expressed as
thus, the output function of KI-ELM can be given as
In Eq. (3), assuming \(A = \left[ {\frac{1}{C} + K_{{{\text{ELM}}}} } \right]\), then at moment t, there are
Thus, at \(t + 1\), there are
Suppose
Then, Eq. (5) can be simplified as
For the new input data, there are
The output value \(\hat{Y}_{{{\text{test}}}} \) can be estimated online as
Gaussian global best-growing operator
The Gaussian global best-growing operator is based on the best optimal strategy and Gaussian distribution and inspired by the global optimal harmony search algorithm. The best optimal strategy uses the state information of the current globally optimal target to improve the mining ability and enhance inner information sharing. Gaussian distribution, or normal distribution, is different from the uniform random distribution \([\mathrm{0,1}]\) utilized in COA. It expands the scope of the rope and improves the global roping ability to a certain extent.
Tent mapping inverse learning
Research showed that in swarm intelligent searching, the convergence performance is affected by the initial population. The larger and the more evenly distributed the populations, the faster the algorithm converges to the optimal solution. Using chaos mapping to initialize the population is random, ergodic, and bounded. Therefore, the search efficiency can be improved.
The initial sequence generated by Tent mapping [19, 20] is more uniform than that by logistic mapping. Therefore, tent mapping was applied to BSOA to initialize the population, and inverse learning strategies were applied to optimize the initial population. Through competition, the better individuals can be selected for the next generation of learning. Therefore, the population searching scope was expanded, and the invalid searching operation was reduced. Thus, the speed of the convergence is improved. The mathematical expression is as follows [21]:
Defining an inverse solution feasible in D-dimensional space as \(x = \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} {x_{1} ,} & {x_{2} ,} \\ \end{array} } & {\begin{array}{*{20}c} { \ldots ,} & {x_{D} } \\ \end{array} } \\ \end{array} } \right)\), \(x = \left[ {\begin{array}{*{20}c} {a,} & b \\ \end{array} } \right], \) then the inverse solution is \(x^{\prime} = \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} {x_{1}^{\prime } ,} & {x_{2}^{\prime } ,} \\ \end{array} } & {\begin{array}{*{20}c} { \cdots ,} & {x_{D}^{\prime } } \\ \end{array} } \\ \end{array} } \right)\), where \(x^{\prime} = a + b - x_{i}\).
Dynamic mutation strategies
In later iterations of BSOA, the decreasing population diversity reduces the searching capacity. Dynamic mutation strategies are introduced to increase population diversity in later iterations and improve the convergence accuracy, so that premature can be avoided. Scholars have proposed a variety of variation algorithms, such as the Gaussian variation and Cauchy variation [22, 23]. Compared with the Gaussian operator, the Cauchy operator has longer wings and can generate a larger range of random numbers, offering a greater chance to jump out of the local optimum. In the meantime, the Cauchy variation takes less time to search nearby areas when the peak value is lower.
The proposed HCOBSO method
Output weight \(\beta\) is the most important parameter of an ELM. Therefore, the number of hidden layer nodes \(L\) should be estimated in advance. In this chapter, the HCOBSO method based on ICOA and IBSO is proposed. The motivation of the new idea is stated in section “Kernel incremental extreme learning machine”. The concepts of ICOA and IBSO were explained in detail in sections “Gaussian global best-growing operator” and “Tent mapping inverse learning. The implementation of the proposed hybrid intelligent optimization method was summed up in section “Dynamic mutation strategies”. Therefore, an understanding of the proposed optimization method can be facilitated.
Motivations
Although COA and BSO are all wolves’ optimization algorithms, they are different in some sense. In terms of searching, COA simulates the growing and dying processes of the coyote, whereas BSO simulates the Hierarchy and hunting patterns of beetle. In terms of guide pattern, COA uses only one best wolf to guide other wolves growing. Therefore, the two methods were improved, respectively, and then fused to compensate for the drawbacks in each of them. As a result, an HCOBSO method was obtained. The proposed hybrid intelligent optimization method took advantage of the global search ability of COA and the local search capability of BSOA.
The improved COA method
The COA method is inspired by the Canis latrans species in North America, and is classified as both swarm intelligence and evolutionary heuristic method. COA method with a structure different from that of other optimization method does not focus on the social hierarchy and dominant rules.
In this paper, we introduce the Gaussian global best-growing operator to replace the original growing operator in the COA method to enhance the searching efficiency and convergence speed. In the following, each step in the proposed COA framework will be explained in detail.
-
1.
Parameter initialization and random initialization of the coyote pack First, the global population of coyotes consisting of \(N_{p}\) packs each with \(N_{c}\) coyotes is initialized. The initial social conditions for the \(C{\text{th}}\) coyote of the \(p{\text{th}}\) pack of the \(j{\text{th}}\) dimension inside the search space are set randomly, as follows [24, 25]:
$$ {\text{soc}}_{c,j} = {\text{lb}}_{j} + r_{j} \times \left( {{\text{ub}}_{j} - {\text{lb}}_{j} } \right) , $$(12)
where \({\text{lb}}_{j}\) and \( {\text{ub}}_{j}\) represent the lower and upper bounds of the \(j{\text{th}}\) decision variable, and \(r_{j}\) is a real random number generated inside the range \(\left[ {0,1} \right]\) using uniform probability. Then, the social fitness value of the coyote is evaluated as
-
2.
Coyote growing inside the pack. The information of the coyotes is computed with the COA as the cultural tendency \( {\text{cult}}\) of the pack
$$ {\text{cult}}_{j} = \left\{ {\begin{array}{*{20}c} {O_{{\left( {N_{c} + 1/2,j} \right)}} } & {N_{c} \,\,{\text{is odd}}} \\ {\left( {O_{{N_{c} /2,j}} + O_{{\left( {N_{c} + 1/2,j} \right)}} } \right)} & {{\text{otherwise}}} \\ \end{array} } \right\}. $$(14)
Therefore, the coyotes are under the alpha influence \(\delta_{1}\) and the pack influence \(\delta_{2}\). \(\delta_{1}\) means a cultural difference from a random coyote of the pack \({\text{cr}}_{{1}}\) to the alpha coyote. \(\delta_{2}\) means a cultural difference from a random coyote of the pack \({\text{cr}}_{2}\) to the cultural tendency of the pack. \(\delta_{1} \) and \(\delta_{2}\) are written as
The \({\text{alpha}}\) is the best coyote inside the pack and the \({\text{alpha}}\) of the \(p{\text{th}}\) pack of the \(t{\text{th}}\), an instant of the time, is defined as
Unlike the original COA method, in this paper, the new social condition of coyote: \({\text{new}}\_{\text{soc}}_{1}\) is updated using the Gaussian global best-growing operator through the following equations:
The \({\text{GP}}\) is the global best coyote in the current condition, \(rn_{1}\) and \(rn_{2}\) represent a real random number using Gaussian normal distribution. Compared with the original COA, there is a big difference between the individuals at the beginning of the searching operation. It can enhance the searching ability at the beginning of the searching operation. While in the latter searching, the difference is become small and the searching space is become small. Therefore, the new social condition of coyote can improve the production capacity.
-
3.
Birth and death of coyote The birth of new coyote pups is written as a combination of the social conditions of two-parent and environmental influence
$$ {\text{pup}}_{j} = \left\{ {\begin{array}{*{20}c} {{\text{soc}}_{{cr_{1,j} }} } & {r_{j} < p_{{\text{s}}} \;{\text{or}}\;j = j_{1} } \\ {{\text{soc}}_{{cr_{2,j} }} } & {r_{j} \ge p_{{\text{s}}} + p_{{\text{a}}} \;{\text{or}}\;j = j_{2} } \\ {R_{j} } & {{\text{otherwise}}} \\ \end{array} } \right\}, $$(20)
wherein \(r_{j}\) is the random number in the range \(\left[ {0,1} \right]\) using uniform probability, \(j_{1}\) and \(j_{2}\) are two random dimensions of the problem, \(R_{j}\) is a random number inside the decision variable bound of the \(j{\text{th}}\) dimension, and \(p_{{\text{s}}}\) and \(p_{{\text{a}}}\) are scatter probability and association probability, respectively. Besides, \(p_{{\text{s}}}\) and \(p_{{\text{a}}}\) can be written as
To keep the population size static, the COA syncs the birth and death of coyotes, as shown in Alg. 1.
-
4.
Coyote expulsion and admission Sometimes, the coyote will leave packs and become solitary or join a pack instead and occurs with probability \(p_{{\text{e}}}\)
$$ p_{{\text{e}}} = 0.005 \cdot N_{C}^{2} . $$(23)
The pseudocode of the proposed COA is described in Alg. 2
To evaluate the robustness of the proposed COA algorithm, the Schaffer function is adopted to test its optimization performance, as shown in Fig. 1.
Figure 1 shows that the proposed COA methods can find the global optimal solution effectively. The optimization results are close to the global extreme reference value, indicating that the proposed COA method has a good optimization ability.
The improved BSO method
The solving speed and precision accuracy of BSO are higher than other optimal methods. BSO is used in signal positioning and data classification successfully. However, the original BSO has several drawbacks. At first, it falls into local optimality easily when solving the complex optimal problem. Second, the computational complexity is very high when updating the parameters. Third, sawyer only relies on a single individual search, increasing the possibility of the optimal algorithm into the bureau in the multi-dimension function optimization. Therefore, the BSO optimal method is improved using Tent mapping inverse learning and dynamic mutation.
For the population initialization, the Tent mapping inverse learning is adopted to initializing the population as follows [26]:
-
1.
In the searching space, the Tent mapping inverse learning is adopted to generate \(N\) positions \(x_{ij}\) of the beetle population as the initial population \({\text{OB}}\);
-
2.
According to the definition of inverse solving, inverse positions \(x_{ij}^{\prime }\) of each beetle position \(x_{ij}\) in the initial population \({\text{OB}}\) are generated as the inverse population \({\text{ FB}}\);
-
3.
The fitness value belongs to the \(2N\) beetle populations, sorted using the ascending sort with the combination of the population \({\text{OB}}\) and \({\text{FB}}\). \(N\) beetle population was selected while corresponding top \(N\) fitness value as the initial population.
In the later iteration of the optimal beetle method, the diversity of the population will become worse. Therefore, the secondary optimization operation is performed to the beetle population using the dynamic mutation
In Eq. (24), \(\eta\) is the mutation weight and \(C\left( {0,1} \right)\) is a random value which generates by the Cauchy operator with the probability parameter 1. The pseudocode of the improved BSO is shown in Alg. 3.
To demonstrate the effectiveness of the proposed BSO method, we test the convergence using the unimodal function \(\min f\left( x \right) = \sum\nolimits_{i = 1}^{n} {x_{i}^{2} }\) and compare it with GA, PSO, DE, and original BSO methods. The test results are shown in Fig. 2. From the experiment results, we can find that the convergence speed and convergence accuracy are improved slightly while compared with other four optimization approaches. The global searching ability and local searching ability will be balanced, and the convergence speed will be accelerated and bring it to jump out the local minima when we bring up the Tent mapping inverse learning and dynamic mutation.
The proposed HCOBSO method
A new hybrid optimization approach called hybrid Coyote Optimization and Beetle Swarm Optimization (HCOBSO) algorithm is proposed with the advantage of improved COA and improved BSO optimization method while deriving from Frog Leaping algorithm (FLA) to improve the generalization performance. The detailed pseudocode of the proposed optimization method is given in Table1.
The proposed HI-DKIELM
The detailed structure of the proposed HI-DKIELM is given. The design idea of the HI-DKIELM is based on KI-ELM and deep learning network, showing that the HI-DKIELM consists of three parts: an input layer, output layer, and some hidden layer of a cascade, as shown in Fig. 3. During the training process, the HCOBSO optimization method is utilized to optimize the output weight to enhance its robustness. According to \(k\) hidden layer, the input feature \(X^{k}\) can be obtained after subtracting the initial input data and then mapping it using the kernel function. The detailed implementation process of the proposed HI-DKIELM is taken in Table 2.
Results and discussion
Experimental settings
Several experiments are provided in diverse ways to demonstrate the effectiveness of the proposed HI-DKIELM approach.
The experiment is tested on a PC with Intel Core I7-8700 at 3.40 GHz and 16 GB RAM. The proposed method is implemented using Matlab2013a, and the code for the other models comes directly from the code published by the respective authors. To verify the effectiveness and robustness of the HI-DKIELM algorithm, the experiment is divided into five parts:
-
1.
First of all, in section “Evaluation of the performance of the HCOBSO optimization algorithm using the CEC2017 and CEC2019 database”, the CEC2017 and CEC 2019 database is adopted to test the performance and robustness of the HCOBSO optimization algorithm proposed in section “The proposed HCOBSO method” and compared with other incomplete optimization methods to test the contribution of each improvement.
-
2.
Section “Performance evaluation of the HCOBSO optimization algorithm on typical functions” focuses on the proposed HCOBSO optimization method is tested on five typical optimization functions to check the optimization ability.
-
3.
Section “Selection of hyperparameters” focuses on selecting hyperparameters in HI-DKIELM, including the penalty factor \(C\) and the number of hidden nodes \(L\).
-
4.
Section “Regression analysis” focuses on the regression problems. The generalization performance of the HI-DKIELM algorithm is tested with 10 sets of real UHI data sets and compared it with traditional ELM, I-ELM, EM-ELM, EI-ELM, and B-ELM.
-
5.
Section “Classification analysis” focuses on the classification problems. The generalization performance of the proposed HI-DKIELM algorithm is tested with ten sets of real UHI data sets, and compared with traditional ELM, I-ELM, EM-ELM, EI-ELM, and B-ELM.
-
6.
In section “Real-world application learning tasks”, the performance of the HI-DKIELM in practical application learning tasks is verified and compared with other baseline methods.
Data and parameter settings
In these experiments, several databases are used in various experiments to verify the performance of the novel HI-DKIELM method.
For the experiment shown in sections “Evaluation of the performance of the HCOBSO optimization algorithm using the CEC2017 and CEC2019 database” and “Performance evaluation of the HCOBSO optimization algorithm on typical functions”, the CEC2017 database and five typical optimization functions are used to prove the optimization effectiveness of the proposed HCOBSO optimization method, respectively.
The CEC2017 database consists of a unimodal function (\(F_{1} - F_{3}\)), multimodal function (\(F_{4} - F_{10}\)), mixed-function (\(F_{11} - F_{20}\)), and hybrid function (\(F_{21} - F_{30}\)). The public parameters for all the algorithms are set the same to reflect the fairness of the experiment: the \(D = 30\); the evaluation times of the Maximum function were \(D*10000\).
The CEC2019 database is known “The 100-Digit challenge” which are intend to be used in annual optimization competition. They are used to evaluate the algorithm for large scale optimization problems. The first three functions, CEC01–CEC03, have various dimensions. On the other hand, the CEC04 to CEC10 functions set as ten-dimensional minimization problems in the range \(\left[ { - 100,100} \right]\), and they are shifted and rotated. All the CEC functions are scalable and all global optimum of these functions were united toward point 1.
The five typical functions are shown in Table 3. The \({\text{MaxDT}}\) are setting as 100 and 500 for \(D = 10\) and \(D = 30\), respectively. The dimension of the solution in every typical function \(D\) is 30, and the range of the solution of the typical function \(F_{n5}\) is setting as \(\left[ { - 100,100} \right]\), while the remains are setting as \(\left[ { - 30,30} \right]\).
In section “Evaluation of the performance of the HCOBSO optimization algorithm using the CEC2017 and CEC2019 database”, the best parameter is \(N = 100,\;\; N_{c} = 5,{ }\;{\text{and }}\;N_{p} = 20\) for the COA optimization method. The parameter of the HCOSBO is setting as \(N = 100,\;\;N_{c} = 5, \;\;N_{p} = 20\) in the earlier searching, and the parameter is setting as \(N = 100, \;N_{c} = 10, \;{\text{and}}\; N_{p} = 10\) in the later searching.
In sections “Selection of hyperparameters” and “Regression analysis”, the UHI machine data set is adopted in the experiment with 13 regression problems and 14 classification problems. The data specifications are shown in Table 4. The relevant data of the training set and test set are represented by the notation #train and the notation #test, respectively.
Evaluation of the performance of the HCOBSO optimization algorithm using the CEC2017 and CEC2019 database
To test the contribution of each improvement to the HCOBSO method, the HCOBSO method is compared with some incomplete optimization algorithms such as HCOBSO5 (\(N_{c} = 5,\;N_{p} = 20\)), COA, BSO, ICOA, LPB [27], and FDO [28] on the CEC2017 and CEC2019 database, as shown in Table 5. In the experiment, three functions are selected in each group. The rank method is used in this experiment to compare the mean. The rank becomes better with the decrease of the mean. When the mean is the same, the standard variance will be compared.
Table 5 shows that the HCOBSO5 will get the best mean and variance for the most times on unimodal functions, indicating that the global searching ability of the BSO method and the mining ability of the COA algorithm are improved with the growth of \(N_{p}\). The large value of \(N_{p}\) means the strong mining ability and fast rate of convergence. However, the HCOBSO5 will fall into local minimum easily and the optimization performance is not well on multimodal functions. Among the five optimization methods, the average rank of the HCOBSO is 1.58, while the average rank of COA, BSO, ICOA, and HCOBSO5 are 4.16, 4.58, 2.75, and 2, respectively. The HCOBSO optimization method proposed in this paper gets the best total rank among the five methods, while the total ranks of COA, BSO, ICOA, and HCOBSO5 are 4, 5, 3, and 2, respectively. The results have shown that the HCOBSO can obtain the best optimization performance and prove the effectiveness of this method.
From the results shown in Table6, we can find that the HCOBSO will get the best mean and variance for the most times on CEC 2019 database, that means with the growth of \(N_{p}\), the global searching ability of the BSO method and the mining ability of the COA algorithm are improved in the meantime correspondingly. That means the larger the value of \(N_{p}\), the more strong mining ability and faster rate of convergence. Among the six optimization methods, the average rank of the HCOBSO is 1.7, while the average ranks of LPB, FDO, COA, BSO, and HCOBSO5 are 3.3, 3, 4.5, 5.2and 3.3 respectively. In the meantime, we can observe that the HCOBSO optimization method proposed in this paper gets the best total rank among the six methods.
Performance evaluation of the HCOBSO optimization algorithm on typical functions
In this part, we compare the HCOBSO algorithm with four baseline optimization methods: COA, BSO, ICOA, and DE-MPGWO [15] using typical functions.
In this study, the number of iterations of each algorithm is set to 2000, and the overall scale of the four baseline optimization methods is the same, i.e., \(N_{p} = 40\). The experiments are performed 50 times, and the average results are the final optimization result. Tables 7 and 8 show the optimization result of COA, BSO, ICOA, DE-MPGWO, and HCOBSO methods, while five typical functions are used under \(D = 10\) and \(D = 30,\) respectively.
Tables 7 and 8 show the function \(F_{n2}\) have an ideal performance, and other results of the nine typical methods are summarized as follows: (1) the proposed HCOBSO presents more precise results compared with COA, BSO, ICOA, and DE-MPGWO methods when dealing with the other four typical functions. (2) The BSO algorithm can quickly drop to the minimum and jump out of the local minimum with a short time domain.
To sum up, the HCOBSO optimization method proposed in this study has a good balance in search of accuracy and convergence speed and can improve search ability.
Selection of hyperparameters
In this part, we will deduce parameters in HI-DKIELM and the original ELM scheme. Compared with the traditional neural network training algorithm, the ELM and HI-DKIELM need fewer parameters. According to the introduction of HI-DKIELM in section “The proposed HI-DKIELM”, two specified parameters are crucial: penalty factor \(C\) and the number of hidden nodes \(L\). The results are shown in Fig. 4.
The learning accuracies of ELM and HI-DKIELM in the \(L\) subspace are shown in Fig. 4a and b; the parameter \(C\) is prefixed. The learning accuracies of ELM and HI-DKIELM in the \(C\) subspace are shown in Fig. 4c and d; the parameter \(L\) is prefixed. Figure 4a and b shows that the proposed HI-DKIELM follows a similar convergence property compared to the ELM but with higher testing accuracy. Meanwhile, the performance tends to be more stable in a wide range \(L\). According to the results shown in Fig. 4c and d, the accuracy of HI-DKIELM changes slightly as \(C\) increases, and then increases rapidly to converge a better performance than the original ELM.
The value L needs to be large enough. Since L does not affect the accuracy of the curve, we can select C with few nodes.
Regression analysis
In this part, we will test the performance of the HI-DKIELM method proposed in section “The proposed HI-DKIELM” based on the regression analysis using the actual UHI data set in Table 4 to evaluate its generalization and robustness, compared with four baseline ELM methods: enhanced random search-based ELM (EI-ELM), error minimization ELM (EM-ELM), incremental ELM (I-ELM), and bidirectional ELM (B-ELM).The initially hidden layer neurons and increasing hidden neurons in the network are set as one in the experiment. The number of neurons in the hidden layer is the same as the number of iterations. In Table 9, we compare the testing root-mean-square error (RMSE) and training time of the regression problem.
The proposed method shows a similar generalization performance as other ELMs with much higher learning speed. For instance, the training time of the proposed HI-DKIELM method is only 0.0001S, 156 times as fast as that of ELM, 2025 times of I-ELM, 13004 times of EI-ELM, 75 times of EM-ELM, and 3732 times of B-ELM in the Auto MPG dataset. The training time of the proposed HI-DKIELM method is about 0.0001S, 616 times as fast as that of ELM, 4680 times of I-ELM, 35478 times of EI-ELM, 431 times of EM-ELM, and 453 times of B-ELM in the Delta elevators dataset. Furthermore, in this section, the performance of ELM and the proposed method has been tested in the XOR problem, which has one training sample in each class. The aim of this simulation is to verify whether the proposed method can handle some rare cases, such as cases with extremely few training data sets. The results are that the testing RSME obtained by ELM is close to 0.0010 when 2000 hidden nodes are used, while the testing RMSE obtained by the proposed method is close to 0.0010 when 87 hidden nodes are used.
In addition, the comparison results of average test accuracy are shown in Fig. 5. In Fig. 5, the \(x \)represents the number of hidden nodes, and the \(y \)represents the average testing root-mean-square error (RMSE). Compared with other ELM methods, the HI-DKIELM method has the best test performance. For instance, the test error of the Ca house dataset obtained by the HI-DKIELM method is half of that of ELM. In addition, with 2000 hidden nodes, the test RMSE of the ELM method is close to 0.0010, while that of the HI-DKIELM method is close to 0.0010 with 89 hidden nodes.
Classification analysis
In this part, for the classification problem, we will test the performance of the HI-DKIELM method proposed in section “The proposed HI-DKIELM” using the actual UHI data set given in Table 4 to evaluate its generalization and robustness and compare it with the four baseline ELM methods.
The initially hidden layer neurons and a growing number of hidden neurons in the network are set as one in the experiment. The number of neurons in the hidden layer is the same as the number of iterations. We compare the testing root-mean-square error (RMSE) and training time of the regression problem. The comparison of the testing accuracy is displayed in Fig. 6.
The results shown in Fig. 6 shows that the proposed HI-DKIELM has significant advantages in training speed. In the Connect-4 dataset with a large number of medium input dimension training samples, the running speed of the proposed method is 5600 times, 300 times, 45 times, and 1000 times as fast as that of I-ELM, EI-ELM, EM-ELM, and PC-ELM, respectively. The moderate number of training samples in the Hill Valley dataset has a moderate input dimension. In the hill Valley dataset, the running time of the proposed method is 3 times, 20 times, 25 times, and 1200 times as fast as that of I-ELM, EI-ELM, EM-ELM, and PC-ELM, respectively, with better performance.
Real-world application learning tasks
In this section, we will evaluate the performance of the proposed HI-DKIELM on the real-world applications: image classification task.
The original data are color images from the Corel data set. Each image is segmented, using the Blob world system, into fragments that represent instances. Fragments containing specific visual content (e.g., elephant) are labeled positive, while the remaining fragments are labeled negative. Therefore, fragments (i.e., instances) from the same kind of image (e.g., elephant) form a binary learning problem. We have five different image data sets: Tiger, Elephant, Fox, Bikes, and Cars, and the number of instances is 1096, 1259, 1474, 5215, and 5600, respectively. The instances in Tiger, Elephant, and Fox data sets are described by a 230-dimensional feature vector which represents the color, texture, and shape of the region, while the instances in Bikes and Cars data sets are represented by a 90-dimensional feature vector.
In image classification, visual content-based image retrieval is an important application, for example, finding pictures containing an elephant from a data set. In this subsection, sample images from the benchmark data sets are shown in Fig. 7.
The detailed experimental results are in Tables 10 and 11 for image classification tasks in terms of classification accuracy (ACC) and area under the curve (AUC), respectively. In Tables 10 and 11, the proposed HI-DKILEM on all image datasets achieves the best ACC and AUC, indicating HI-DKILEM is superior to other methods in content-based image retrieval tasks. The extraordinary performance is due to many local approximations created by the proposed HI-DKIELM. The results show that the naive Bayes (NB) method on the Tiger, Elephant, Bikes, and Cars datasets has the worst ACC and AUC performance. However, the SVM method on the Fox data set has the most inferior performance. For other baselines, more detailed experiment results can be found in Tables 10 and 11.
Conclusion
In this paper, a novel ELM method named Hybrid Intelligent Deep Kernel Incremental Extreme Learning Machine (HI-DKIELM) based on the hybrid coyote optimization Beetle Swarm Optimization (HCOBSO) method is proposed. The proposed method has several features different from existing ELM-based methods.
-
1.
An improved Coyote Optimization Algorithm has been proposed to improve the searching efficiency and convergence, and the Gaussian global best-growing operator is utilized to replaces the original growing operator.
-
2.
An improved Beetle Swarm Optimization Method has been designed according to Tent mapping inverse learning and dynamic mutation strategies to prevent the BSO algorithm from falling into a local optimum.
-
3.
A novel hybrid intelligent optimize algorithm called hybrid coyote optimization Beetle Swarm Optimization (HCOBSO) method for optimizing the parameters of the DKIELM has been presented. The proposed hybrid intelligent optimization method combines the global search ability of the coyote optimization algorithm and the local search capability of the Beetle Swarm Optimization algorithm.
In future work, there is developing space to explore the proposed methods. First, it is still challenging to make more insight into ELM for exploring its deep learning capability. Second, from an optimization point of view, we can use other novel heuristic algorithms, such as the five-element cycle optimization method.
References
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501
Huang G-B, Chen L, Siew C-K (2006) Universal approximation using incremental constructive feed forward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892
Zeng Y, Li Y, Chen J et al (2020) ELM embedded discriminative dictionary learning for image classification. Neural Netw 123:331–342
Dasgupa S, Stevens CF (2017) A neural algorithm for a fundamental computing problem. Science 358(6364):793–796
Wang W, Zhang R (2014) Improved convex incremental extreme learning machine based on enhanced random search. Electr Eng Electron Eng 238:2033–2040
Tang J, Deng C, Huang G-B (2016) Extreme learning machine for multilayer perceptron. IEEE Trans Neural Netw Learn Syst 27(4):809–821
Zhang R, Lan Y, Huang GB et al (2012) Universal approximation of extreme learning machine with adaptive growth of hidden nodes. IEEE Trans Neural Netw Learn Syst 23:365–371
Cui D, Huang G, Kasun LLC, Zhang G, Han W (2017) Elmnet: feature learning using extreme learning machines. In: 2017 IEEE international conference on image processing (ICIP) (pp 1857–1861)
Foroughi H, Ray N, Zhang H (2018) Object classification with joint projection and low-rank dictionary learning. IEEE Trans Image Process 27(2):806–821
Wang S, Zhu E, Yin J, Porikli F (2018) Video anomaly detection and localization by local motion based joint video representation and OCELM. Neurocomputing 277:161–175
Sun K, Mou S, Qiu J, Wang T, Gao H (2018) Adaptive fuzzy control for non-triangular structural stochastic switched nonlinear systems with full state constraints. IEEE Trans Fuzzy Syst 2018:1
Zhang W, Zhanga Z, Wang L et al (2019) Extreme learning machines with expectation kernels. Pattern Recogn 96:1–13
Lekamalage CL et al (2016) Dimension reduction with extreme learning machine. IEEE Trans Image Process 25(8):3906–3918
Yang Y, Wu QMJ (2016) Extreme leaning machine with subnetwork hidden nodes for regression and classification. IEEE Trans Cybern 46(12):2885–2898
Wu D, Qu ZS, Guo FJ, Zhu XL, Wan Q (2019) Hybrid intelligent deep kernel incremental extreme learning machine based on differential evolution and multiple population gray wolf optimization methods. Automatika 60(1):48–57
Wu D, Qu ZS, Guo FJ, Wan Q (2019) Hybrid multilayer incremental hybrid cost sensitive extreme learning machine with multiple hidden output matrix and subnetwork hidden nodes. IEEE Access 7:118422–118434
Scardapane S, Comminiello D, Scarpiniti M, Uncini A (2015) Online sequential extreme learning machine with kernels. IEEE Trans Neural Netw Learn Syst 26(9):2214–2220
Li S, Song S, Huang G, Wu C (2019) Cross-domain extreme learning machines for domain adaptation. IEEE Trans Syst Cybern Syst 49(6):1–14
Yu K, Liang J, Qu B, Luo Y, Yue C (2020) Dynamic selection preference-assisted constrained multiobjective differential evolution. IEEE Trans Syst Man Cybern Syst. https://doi.org/10.1109/TSMC.2021.3061698
Lakhdhar W, Mzid R, Khalgui M, Li Z, Frey G, Al-Ahmari A (2019) Multiobjective optimization approach for a portable development of reconfigurable real-time systems: from specification to implementation. IEEE Trans Syst Man Cybern Syst 49(3):623–637
Chen Q, Ding J, Yang S, Chai T (2020) A novel evolutionary algorithm for dynamic constrained multiobjective optimization problems. IEEE Trans Evol Comput 24(4):792–806
Zhou Y, Xiang Y, He X (2021) Constrained multi-objective optimization: Test problem construction and performance evaluations. IEEE Trans Evol Comput 25(1):172–186. https://doi.org/10.1109/TEVC.2020.3011829
Yu K, Liang J, Qu B, Yue C (2021) Purpose-directed two-phase multiobjective differential evolution for constrained multiobjective optimization. Swarm Evol Comput 60:100799
Yu K, Chen X, Wang X, Wang Z (2017) Parameters identification of photovoltaic models using self-adaptive teaching-learning-based optimization. Energy Convers Manag 145:233–246
Pierezan J, Coelho LDS (2018) Coyote optimization algorithm: a new metaheuristic for global optimization problems. In: 2018 IEEE congress on evolutionary computation (CEC). Rio de Janeiro, Brazil, IEEE
Ling T, Zhan Z-H, Wang YX, Wang ZJ, Yu WJ, Zhang J (2018) Competitive swarm optimizer with dynamic grouping for large scale optimization. In: Proc. IEEE Congr. Evol. Comput. (CEC 2018), Rio de Janeiro, Brazil, pp 2655–2660
Rahman CM, Rashid TA (2020) A new evolutionary algorithm: learner performance based behavior algorithm. Egypt Inf J. https://doi.org/10.1016/j.eij.2020.08.003
Abdullah JM, Ahmed T (2019) Fitness dependent optimizer: inspired by the bee swarming reproductive process. IEEE Access 7:43473–43486. https://doi.org/10.1109/ACCESS.2019.2907012
Acknowledgements
The authors would like to thank the anonymous reviewers for their constructive comments.
Funding
This work were supported by National Natural Science Foundation of China (Grant no. 62006075), National Natural Science Foundation of Hunan Province of China (Grant no. 2019JJ50106), Hunan Provincial Science and Technology Department (CN) (Grant no. 19A117) and the science and technology innovation program of Hunan Province (Grant no. 2020RC5019).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wu, D., Li, T. & Wan, Q. A hybrid deep kernel incremental extreme learning machine based on improved coyote and beetle swarm optimization methods. Complex Intell. Syst. 7, 3015–3032 (2021). https://doi.org/10.1007/s40747-021-00486-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40747-021-00486-8