1. Introduction
Mathematical modeling is a compelling strategy for predicting and recognizing the biologic behavior of a cell’s system. The mathematical model can create and predict the results of the empirical hypotheses that can be used to examine the process. In this regard, the biologic model studies the chemical metabolism and the pathways to construct a kinetic model that can give a clear understanding of a cell mechanism [
1].
Therefore, building and developing a kinetic model requires a large number of parameters such as kinetic parameters, potential of hydrogen (pH) and initial enzymatic and metabolite concentration. In the same vein, the building and the developing of the kinetics model gives a clear picture of cell behavior when it is needed in biotechnology applications [
2]. In addition, building and developing a kinetic model is a challenging task because it requires kinetic parameter estimation [
3].
Kinetic parameters estimation is the finding of the nearest value that minimizes the distance between the simulated and the real experimental results [
4]. Kinetic parameters estimation is a complex task because of the nonlinearity of the kinetic model, and it is usually measured in different conditions and time consumption [
5]. Usually, the kinetic parameters stored in the database are insufficient for building accurate kinetic model due to the cell behavior in different measurements stated by different laboratories utilizing various in vitro models and conditions [
6].
Normally, the model is built depending on time derivative expressions using Ordinary Differential Equations (ODEs) which describe the changes in a state or quantity of interest over time. Recently, several kinetic models of cell metabolism are based on ODEs which have been developed to detect time-dependent changes in metabolic concentration. For example, models of glycolysis and pentose phosphate (PP) pathways in
E. coli [
7,
8] and the tri-carboxylic acid (TCA) cycle in
Dictyostelium discoideum [
9] have been constructed. Furthermore, large scale model integration has been performed. For example, the model of [
7] has been integrated with the model of (TCA) cycle [
8,
10] and amino acid biosynthesis [
11]. Researchers have studied the glycolysis pathway to clone a new host and observe its changes and effects [
12].
Therefore, because of the difficulty in estimating kinetic parameters, many researchers lately have used the metaheuristic optimization algorithms’ methods to estimate the kinetics of the
E. coli model and other biologic models. Some of these metaheuristic algorithms were used in order to estimate the kinetic parameters employed by [
13,
14,
15,
16,
17,
18,
19,
20,
21] and some of these algorithms have used experimental data taken from [
7,
22,
23] to investigate their abilities in the kinetic parameters estimation.
In this regards, the biologic kinetic models contain hundreds to thousands of kinetic parameters which have caused serious challenges in parameter estimation due to the high dimension of search space that is required to be explored. Thus, in high dimensional kinetic models (containing hundreds to thousands of kinetic parameters), the procedures are considered to be expensive in terms of computational cost and the performance of the aforementioned algorithms tend to deteriorate, which may lead to low accuracy [
6].
The Differential Evolution (DE) algorithm is an example that is widely used and studied in parameter estimation of metabolic model [
6,
17]. Time consumption is the main weakness of (DE) algorithm; this makes it difficult for the (DE) algorithm to tune its parameters when it involves a large number of processors and multiple local searches. In addition, the Genetic Algorithm (GA) belongs to the class of evolutionary algorithms which almost share the same idea as DE. This algorithm is a popular method that is used in parameter estimation of metabolic model. The main weakness of this algorithm is the high computational time which mostly does not perform effectively when it is compared to DE and Particle Swarm Optimization (PSO) algorithms [
24,
25,
26].
Since 1995, the PSO algorithm has been making great impact in solving many problems in different fields such as [
14,
27,
28,
29]. This method was proposed by [
30] and it was adopted in simulating flock of birds and a school of fish in order to locate their food. The algorithm particles during their movement in the process of searching share the information with each other and then they search for their destination stochastically and independently. Therefore, the particles’ movements in the search space move towards the optimum solution by taking different paths and directions.
Moreover, the method of Se-PSO was proposed and developed based on the PSO algorithm in order to increase the computational time, and the accuracy of PSO was presented by [
31]. This development is based on the recognition of the local and global point problem of PSO [
30]. In this regard, the segmentation divides the particles into groups so as to move together towards the optimal solution. This algorithm was adapted for large-scale kinetic parameters of
E. coli model [
32] and governor–turbine model identification of single area power plant [
28,
32] was used in a linear inertia weight. Due to the nonlinearity of the model which uses linear inertia weight (
) by [
32], this inertia weight affects the exploration and exploitation of the particles. Consequently, there is need for large exploration at the beginning and small exploitation at the end of the algorithm execution. The purpose for this is to avoid the local optima trap so as to increase the efficiency of the kinetic parameters estimation with a view to minimizing the model distances in reasonable time.
In this study, the method of Se-PSO algorithm was enhanced by adding a damping process which helped the particles to have a wide exploration at the beginning and small exploitation at the end during the algorithm execution. The aim was to investigate the amount of each solution to be found at the search space. Thus, the enhanced Se-PSO was adapted to estimate large-scale kinetic parameters of E. coli and to increase the efficiency of the Se-PSO algorithm in terms of exploration and exploitation.
The other sections of the study are organized as follows:
Section 2 describes the problem formulation.
Section 3 describes the materials and methods.
Section 4 describes the obtained results.
Section 5 gives the conclusion of the findings of this study.
4. Results
The estimation of large-scale kinetic parameters in the kinetic model was a difficult task due to the nonlinearity of the model. For this reason, the kinetic parameters were reported from different laboratories and time consumption [
32]. It was stated that during the application of the local sensitivity analysis to [
14] model result, each kinetic parameter increased up to 200% by step 0.5 to calculate its sensitivity. It was found out that 7 kinetic parameters were highly affected by the used model’s response for estimating the proposed algorithm. These sensitive kinetic parameters are
,
,
,
,
,
and
which were involved in this reaction rate of
. In addition, the experimental data of [
16] were taken. These experimental data are
,
,
,
,
,
,
,
,
,
,
and
.
As a matter of fact, during the ESe-PSO algorithm execution, the segments of the kinetic parameters of [
32] were increased by adding 1 segment to each kinetic to increase the possibility of searching the wide-area and finding an accurate solution which is depicted in
Appendix A Table A3.
The ESe-PSO algorithm parameters such as
was initialized as mentioned in [
32],
and
. In this regard, the kinetic boundaries with their upper and lower bounds values were initialized with small increase based on [
32] to allow the proposed algorithm search large space. Here, the mechanism for selecting the boundaries was based on the original kinetic parameters values, which assumed that these values are
with each kinetics’ small value, so that the optimum result may be around it. These kinetic parameters boundaries are depicted in
Table 2 with their estimation as follows:
After the kinetic parameters were initialized, the kinetic parameters were segmented to find the optimal segment for each kinetics. Later on, the updating of position, velocity and the fitness function was done based on the initialization of ESe-PSO parameters such as the inertia weight, damping process and the acceleration coefficient (). Thus, the optimal segment was determined based on the fitness function. The optimal segments were sent to the PSO algorithm which provided a local best solution so that PSO would search around the optimal segments.
In this regard, the ESe-PSO algorithm estimated the kinetic parameters and minimized the model distance to 28.94%. The model simulation was moved closer to the real experimental data when compared to the considered model; the Se-PSO, GA and DE algorithms are described in
Table 3. The GA parameters were set as follows: generation gap = 0.8, crossover = 0.85, mutation = 0.05 [
33] and maximum generation = 500. The DE parameters were set as follows: scaling factor
, crossover rate
, search strategy was DE/rand/1/bin as stated in [
6], population size = 100 and a number of generation = 500 [
32].
Table 3 shows how the proposed method moved the model simulation towards the experimental data closely. This simulation increased
metabolites and this may be due to the other pathways engaged during the execution of the algorithm such as anaplerotic, TCA, with acetate formation. In addition, this increase may be the
,
and
metabolites that were affecting the cell’s growth [
10].
The analysis of the model distance minimization showed that 12 out of the 12 datasets were moving towards the real experimental data. These metabolites are
,
,
,
,
,
,
,
,
,
,
and
thus, the metabolites were still slightly far from the experiment, perhaps because of the model complexity, shortage of data and the portion of some metabolites. Furthermore, the G6P empirical data were 3.48
and the simulation outcome of G6P could be leading to the empirical data rather than the G6P result model under examination when related to the G6P model. This discrepancy may be because of the model resolution of G6P consistency in the presence of
and
after the addition of
of
as reported by [
10].
Moreover, the
model simulation was having a better movement towards the experimental data rather than the considered model. This may be due to the nonlinearity of the system or the lumping of
and
metabolites in one equation. The changes in
may be due to the lumping of
and
as stated in [
10].
As stated in
Table 3 above, the simulation response of the distance model minimization was moving towards the experimental data in 12 metabolites. Moreover, the model under investigation had five pathways than the data of only two pathways where only one metabolite was not well minimized (
) due to the lack of experimental data. The estimated kinetic parameters were used in the model simulation to compare our model responses to [
10] model responses. The result of the comparison is depicted in
Figure 2 below:
In order to prove that the result is statistically consistent, the STD, mean and F value were calculated for the considered model. The result of the simulated model is displayed in
Table A3 and Equations (3)–(6).
In
Appendix A Table A4, the mean and STD were calculated for the proposed algorithm results and the response of the considered model’s result. The proposed algorithm showed 0.6855 mean compared to the considered model (0.5251), DE (0.6189) and GA (0.6216) which indicated that the proposed algorithm result moved to the experimental data mean of 0.9647. In addition, DE algorithms showed a better result than GA in the mean values.
In addition, the proposed algorithm showed 0.9320 STD compared to the considered model’s 0.9198 STD which indicated that the proposed algorithm result moved to the experimental data STD of 1.2271. Thus, the hypothesis of the result in
Table 4 is calculated as follows:
The hypothesis of
Table A4 is calculated and decided below:
where
is the standard deviation of the optimized result
,
is the standard deviation of the considered model
and the
are the number of variables for the optimized and model result. This hypothesis was locking for minimizing the considered model’s errors. This suggests that we accepted
and rejected
as the accepted result. Thus, the F test showed that the proposed method had moved to the critical
F value better than Se-PSO, DE and GA by indicating1.0482, 1.0497, 1.2881 and 1.3156, respectively.
The objective function and time consumption were calculated for the proposed algorithm, DE and GA as depicted in
Table 4 below:
Table 4 demonstrates a comparison between the proposed algorithm with DE and GA algorithms. Table indicates that the proposed algorithm in this study produced superior results in terms of objective function values over 10 runs. The mean, STD, best and lowest objective functions were calculated. ESe-PSO algorithm produced the best mean (7.035E–05) and STD (0.000272) when compared to Se-PSO (mean 0.000603; STD 0.003567), DE (mean 0.049185; STD 0.251213) and GA, (mean 0.11476; STD 0.258243). The proposed method achieved the best objective function (2.3E-06) while Se-PSO ((0.00352), DE (0.00827) and GA (0.002579) achieved the best objective function. The consumed time by the proposed algorithm was achieved in 19 h while Se-PSO, DE and GA achieved the result in 19, 26 and 29 h, respectively.