1 Introduction

The soil shear strength can be defined as the magnitude of shear stress that soil is capable of withstanding [1]. In civil engineering, the shear strength of soils is fundamental for describing their susceptibility to applied pressures from building loads and construction machines/equipment. Geotechnical engineers use shear strength of soil as an essential factor to evaluate the stability of structure on or embedded in the ground, such as retaining walls, embankments, airfield pavements, and foundations of a high-rise building [2]. Therefore, obtaining an accurate estimation of the shear strength of soil is a highly important task in various geotechnical designs [3,4,5].

In practice, computing this parameter of interest faces various difficulties. It is because the mapping function between the shear strength and soil properties has been proved to be complicated [6,7,8,9,10,11,12]. Freitag [13] found that there is a complicated dependency between soil strength and properties of moisture content and bulk density; thus, estimation of soil strength based on such factors is by no means an easy task. Conventional estimations of shear strength are dependent of in the values of the cohesion (c) and the angle of internal friction (φ). However, there are inconsistencies in the values of the cohesion and the angle of internal friction for a particular soil [14].

Fredlund et al. [15] attempted to capture the variation of the soil shear via a soil-water characteristic curve and the saturated shear parameters of soil [15]. However, the closed-form predictions are not applicable to all types of soils. Gao et al. [3] pointed out that the shear strength of soil varies significantly with different soil types. Moreover, performing laboratory tests (e.g., applied triaxial equipment) to estimate the shear strength of soils is also found to be both time-consuming and costly due to the employments of instruments as well as highly trained technicians [16].

Due to such challenges, alternative methods based on advanced machine learning models have been recently proposed to deal with the task of soil shear strength prediction. Based on existing samples collected from laboratory tests, intelligent models can be developed and train to perform shear strength calculations automatically and instantly. Based on a direct comparison between actual testing outcomes and results predicted by those intelligent models, the reliability of machine learning models can be objectively judged.

Adaptive neuro-fuzzy inference system (ANFIS) has been employed to predict the shear strength of unsaturated soils [17]. This fuzzy neural network model is shown to be a capable tool for dealing with the problem of interest. A hybrid model consisting of artificial neural networks and support vector machines with various ensemble strategies (i.e., voting, bagging, and stacking) has been proposed by Chou and Ngo [18] to compute the strength of fiber-reinforced soil. Chen et al. [19] proposed a modified linear regression method for modeling the variation of shear strength based on soil properties. Mbarak et al. [20] predict the undrained shear strength of soil with the utilizations of random forest, gradient boosting and stacked models.

Recently, metaheuristic approaches have been increasingly harnessed to create intelligent models to copy with the task at hand. Pham et al. [21] replaced the conventional gradient descent based algorithm used for training ANFIS models by well-established metaheuristic algorithms, including particle swarm optimization and genetic algorithm. Tien Bui et al. [22] recently combines the least square support vector machine and metaheuristic method of cuckoo search optimization for predicting the shear strength parameter of soft soil. Moayedi et al. [23] investigated the feasibility of spotted hyena optimizer and ant lion optimization for training neural networks used for estimating the soil parameter of interest. Nhu et al. [24] establish an integrated model of support vector regression and particle swarm optimization relying on various soil features such as moisture, density, components, void ratio, and water content. The dragonfly algorithm, whale optimization algorithm, invasive weed optimization, elephant herding optimization, shuffled frog leaping algorithm, salp swarm algorithm, and wind-driven optimization models have been employed to construct capable neural network-based prediction models [25, 26]; Metaheuristic has shown their capability of improving predictive accuracy of neural network models by annihilating the drawback of conventional backpropagation and gradient descent methods [27,28,29].

Besides, in the fields of data science and civil engineering, there is an increasing trend of applying ensemble learning methods to solve complex data modeling tasks [18, 30,31,32,33]. This novel approach of machine learning employs multiple learning paradigms to construct combined models featuring excellent predictive capability. Ensemble models have been shown to attain better prediction accuracy and alleviate the overfitting phenomenon [34,35,36,37]. A machine learning ensemble is generally constructed by aggregating results produced by several alternative models; therefore, the trained ensemble model often has good flexibility in data modeling and strong resilience to noise [38,39,40].

Following these two trends of research, this study proposes a novel metaheuristic optimized machine learning ensemble approach for predicting the soil shear strength. The ensemble includes the two capable individual machine learning methods of the radial basis function neural network (RBFNN) [41, 42] and multivariate adaptive regression splines (MARS) [43, 44] which are powerful tools for nonlinear and multivariate data modeling. The two techniques also present different types of machine learning; thus, they are likely to contribute their reciprocal strengths and cover weakness of each other in building a meta-ensemble model with an outstanding efficiency [45]. In this integrated framework, individual RBFNN and MARS models are trained and their prediction results are combined via the stacking aggregation method [35].

In addition to simultaneously tuning hyper-parameters for constituent models in the ensemble model, the weights of each individual model’s output must be determined adaptively. The training phase of the proposed meta-ensemble model is further enhanced by the employment of the artificial electric field algorithm (AEFA)—a state-of-the-art metaheuristic approach. The AEFA [46] is inspired by the concept of the electric field, charged particles, and the Coulomb’s law of electrostatic force. This algorithm has achieved competitive performances against other optimization methods in terms of both solution quality and convergence speed. Hence, the AEFA surely optimizes the performance of the stacking method-based ensemble model by simultaneously searching for the most suitable set of hyper-parameter values and weights of constituent models.

A dataset of 249 data samples collected from the geotechnical investigation process in Hanoi (Vietnam) is used to train and test the proposed approach. The variables of the depth of the sample, sand percentage, loam percentage, clay percentage, moisture content percentage, wet density, dry density, void ratio, liquid limit, plastic limit, plastic index, and liquidity index are employed as influencing factors.

All in all, the goal of this research is to construct and verify a hybrid Metaheuristic-optimized meta-ensemble learning model, denoted as MOMEM, to enhance the prediction accuracy of the soil shear strength. The contribution of this study is multifold: (1) a novel machine learning ensemble for predicting soil shear strength is proposed; (2) the model is adaptively constructed with the employment of the AEFA metaheuristic; (3) a superior predictive performance is achieved compared to other benchmark machine learning models. The rest of the study is organized as follows: the collected dataset is described in the second section. The next section reviews the research method, followed by a section that presents the newly developed machine learning framework. Experimental results and comparisons are reported in the fifth section. The final section provides several concluding remarks of the study.

2 The collected dataset

The dataset used in the present study was collected at the geotechnical investigation phase of the Le Trong Tan Geleximco Project, located in the west of Hanoi, Vietnam (Fig.1). The site investigation was conducted in April 2009. This project covers an area of approximately 135 ha,  which was used for the construction of low-rise housing, high-rise housing, public infrastructures, and entertainment centers. In order to gather information on the soil conditions, the boring-based soil sampling is utilized. The boreholes are drilled by means of slurry (a mixture of bentonite and water), and thin-walled metal tubes to ward off soil collapses. The soil samples with a diameter of 91 mm are gathered by the method of piston samplers. The sample collection process complies with the Vietnamese national standards of the TCXDVN-194-2006 (High Rise Building—Guide for Geotechnical Investigation), the TCN-259-2000 (the procedure for soil sampling by boring methods).

Fig. 1
figure 1

Location of the study site (Zone A—Le Trong Tan Geleximco, Hanoi, Vietnam)

There were 65 boreholes with a total of 249 soil samples collected from the geotechnical investigation process. The depth of the collected soil samples ranges from 1.20 to 39.5 m. The factors measured from soil samples are (1) depth of sample (m), (2) sand percentage (%), (3) loam percentage (%), (4) clay percentage (%), (5) moisture content percentage (%), (6) wet density (g/cm3), (7) dry density (g/cm3), (8) void ratio, (9) liquid limit (%), (10) plastic limit (%), (11) plastic index (%), and (12) liquidity index. These 12 factors are employed as conditioning variables to estimate the shear strength of the soil. Descriptive statistics of the soil variables in this study are shown in Table 1. Figure 2 graphically presents a correlation of soil-shear-strength with its conditioning factors.

Table 1 Input and output parameters
Fig. 2
figure 2

Distributions of the soil parameters used in this research

3 Methodology

3.1 Radial basis function neural network (RBFNN)

A typical structure of a RBFNN model [41, 42] used for nonlinear function approximation consists of three layers and is graphically presented in Fig. 3. The first layer has a number of neurons equal to the number of influencing factors, which is equal to 12 in this study. The next layer comprises a set of radial basis function (RBF) neurons, which are basic units of information processing. This neuron employs the RBF, which is an attenuation, central-radial symmetry, non-negative, non-linear function. It is noted that all functions that are dependent only on the distance from a center vector are radially symmetric about that vector. The last layer, which is the output layer, yields the network output by performing a linear combination of outputs produced by units in the second layer.

Fig. 3
figure 3

Structure and pseudocode of RBFNN

The RBF used in the second layer is mathematically presented as follows [47]:

$$\varphi (x) = \exp \left( { - \frac{{(x - v)^{2} }}{{2\sigma^{2} }}} \right),$$
(1)

where \(v\) and \(\sigma\) denote the parameter of position and width, respectively, of the RBF nodes.

It is proper to note that the number of output neurons in this study is 1, which is the shear strength of the soil. Accordingly, the soil shear string can be computed via the RBFNN model as follows [48,49,50]:

$$y = \mathop \sum \limits_{j = 1}^{{N_{n} }} w_{j} \varphi_{j} \left( {x - \nu_{j} } \right) + b,$$
(2)

where x = (x1x2, …, xu) denotes an u-dimensional vector; x − νj is Euclidean distance between the center of the jth hidden node and a data point; wj represents the connecting weight from the jth hidden node to output layer; b is the bias term; Nn denotes the number of hidden neurons; \(\varphi_{j} \left( \bullet \right)\) is the radial basis function of the jth hidden note.

It is noted that the model construction phase of the RBFNN model requires a proper determination of the number of neurons (Nn) and the RBF width (σ). An appropriate set of Nn and σ ensures the success of establishing a robust RBFNN model used for soil shear strength estimation. The parameter Nn dictates the architecture of the inference model, resulting to either the over-fitted (in case of too many RBF nodes assigned) or under-fitted model with low accuracy (in case of having a small number of RBF nodes) [51, 52]. Additionally, the parameter σ affects the influence of RBF nodes on each data point; therefore, it determines the generalization of the prediction model.

3.2 Multivariate adaptive regression splines

MARS was developed by Friedman [43], which is formed by fitting basic function (term) to distinct intervals of input variables. In general, splines (also called piecewise polynomials) have pieces smoothly connected with joining points called knots (k). MARS uses two-sided truncated power functions as spline basis functions, described as Eqs. (3) and (4):

$$[ - (x - k)]_{ + }^{q} = \left\{ {\begin{array}{*{20}l} {(k - x)^{q} } \hfill & {{\text{if }}x < k} \hfill \\ 0 \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right.$$
(3)
$$[ + (x - k)]_{ + }^{q} = \left\{ {\begin{array}{*{20}l} {(k - x)^{q} } \hfill & {{\text{if }}x \ge k} \hfill \\ 0 \hfill & {{\text{otherwise}},} \hfill \\ \end{array} } \right.$$
(4)

where \(\begin{array}{*{20}c} {(q} \\ \end{array} \ge 0)\) is the degree the resultant function estimate; \([]_{ + }\) indicates to take positive values.

An interaction term is yielded by multiplying an existing term with a truncated linear function having a new attribute. Accordingly, both the existing term and the newly created interaction term are involved in the construction MARS model. The search for new terms is limited in a maximum order of the user. The formula of interaction term and the general MARS function can be referred to Eqs. (5) and (6), respectively:

$$B_{m} (x) = \mathop \prod \limits_{j = 1}^{{K_{m} }} [s_{m,j} \times (x_{v(m,j)} - k_{m,j} )]_{ + } ,$$
(5)
$$\hat{y} = \hat{f}_{M} (x) = c_{0} + \mathop \sum \limits_{m = 1}^{M} c_{m} T_{m} (x),$$
(6)

where Km is the number of truncated linear functions multiplied in the mth term. Km is less than maximum interaction among variables Imax that is pre-defined by the user. xv(m,j) is the input variable corresponding to the jth truncated linear function in the mth term; km,j is the knot value of xv(m,j); sm,j is the selected sign + 1 or − 1; \(\hat{y}\) is the dependent variable predicted by the MARS model; c0 is a constant; Tm(x) is the mth term, which may be a single spline basis functions; and cm is the coefficient of the mth term.

A MARS model is constructed through a two-stage process including a forward phase and backward phase. The forward phase adds terms until the maximum number of terms is reached which is pre-assigned by users (Tmax). This phase aims at reducing the sum-of-squares residual error and tends to result in the complex and over-fitted model. Hence, it needs to have a backward phase to remove the redundant terms with less impact on the model’s generalization. The generalized cross-validation (GCV) was introduced as a criterion to recognize those redundant terms as shown in Eq. (7):

$${\text{GCV}}(M) = (1/n) \times \mathop \sum \limits_{i = 1}^{n} [y_{i} - \hat{f}_{M} (x_{i} )]^{2} /(1 - C(M)/n)^{2}$$
(7)

where n is the number of data patterns. C(M) is a dynamic complexity penalty defined as Eq. (8):

$$C(M) = (M + 1) + d \times M,$$
(8)

where M is the number of terms in Eq. (8); the parameter d is a penalty for each term added into the model and pre-assigned by the user. Notably, a suitable value of d can lead to an under-fitted or over-fitted model with poor generalizability [53,54,55]. In general, Tmax, Imax, and d are regarded as essential factors to control the MARS model’s performance.

3.3 Artificial electric field algorithm

Artificial electric field algorithm was developed by [46] which is inspired by the Coulomb’s law of electrostatic force, stated that an electrostatic force between two charged particles is directly proportional to the product of their charges and inversely proportional to the square of the distance between their positions. In the AEFA, an agent is regarded as a charged particle and its strength is measured by electric charge. A particle can either attract or repel others by an electrostatic force which is used as a means of communication channel among particles. A particle with better fitness will possess a greater electrostatic force. In the AEFA, the attractive electrostatic force is regulated that a charged particle with the greatest charge attract all other particles of lower charge and move slowly in the search space. Figure 4 presents the flowchart of the AEFA.

Fig. 4
figure 4

Artificial electric field algorithm’s pseudo-code

The AEFA first randomly locates of N particles in d-dimensional search space with the position of ith particle presented as Xi = [x 1i x 2i x 3i , …, x di ] with \(i \in [1, 2, 3, \ldots , N]\). The position of the particle ith at time t is given by the following equation:

$$p_{i}^{d} (t + 1) = \left\{ {\begin{array}{*{20}l} {p_{i}^{d} (t)} \hfill & {{\text{if }}f(P_{i} (t) < f(X_{i} (t + 1))} \hfill \\ {x_{i}^{d} (t + 1)} \hfill & {{\text{if }}f(X_{i} (t + 1) \le f(P_{i} (t)).} \hfill \\ \end{array} } \right.$$
(9)

The location of the best particle with the greatest fitness is denoted by Pbest = Xbest. The force acting on the charge ith from charge jth is calculated as follows:

$$F_{ij}^{d} (t) = K(t)\frac{{Q_{i} (t) + Q_{j} (t)(p_{j}^{d} (t) - X_{i}^{d} (t))}}{{R_{ij} (t) + \epsilon }},$$
(10)

where Qi(t) and Qj(t) are the charges of ith and jth particle at any time t, respectively; K(t) is the Coulomb’s constant at any time t; ϵ is a positive constant; and Rij(t) is the Euclidian distance between the ith particles and the jth particle. The Coulomb’s constant K(t) is given as follows:

$$K(t) = K(0)*\exp \left( { - \alpha \frac{{\text{iter}}}{{\text{maxiter}}}} \right),$$
(11)

where α and K0 are a constant value and initial value, respectively; iter is the current iteration and maxiter is the maximum number of iterations.

The total electric force acts on the ith particle by all the other particles is given by the following equation:

$$F_{i}^{d} (t) = \mathop \sum \limits_{j}^{N} {\text{rand}}()F_{ij}^{d} (t),\quad {\text{with }}i \ne j,$$
(12)

where rand() is an uniform random number in the range of [0 1].

The electric field of the ith particle at any time t is given by the following equation:

$$E_{i}^{d} \left( t \right) = F_{i}^{d} \left( t \right)/Q_{i}^{d} \left( t \right).$$
(13)

The acceleration of the ith particle at any time t is given by the following equation:

$$a_{i}^{d} \left( t \right) = \frac{{Q_{i} \left( t \right)E_{i}^{d} \left( t \right)}}{{M_{i} \left( t \right)}},$$
(14)

where Mi(t) is the unit mass of the ith particle at any time t. The velocity and position of the ith particle are updated as follows:

$$V_{i}^{d} \left( {t + 1} \right) = \alpha *V_{i}^{d} \left( t \right) + a_{i}^{d} \left( t \right)$$
(15)
$$X_{i}^{d} \left( {t + 1} \right) = X_{i}^{d} \left( t \right) + V_{i}^{d} \left( {t + 1} \right),$$
(16)

where rand() is a uniform random number in the interval [0, 1].

The charge of a particle is calculated based on the fitness functions with an assumption of equal charge for all particles as shown in Eq. (10). Fundamentally, the charge of the best particle should have the greatest value while other particles have charge values in the range of [0 1]. Hence, the particle with a larger charge has a greater force for the best fitness value:

$$Q_{i} = Q_{j}\quad i,j \in \left[ {1,2,3, \ldots ,N} \right]$$
(17)
$$q_{i} (t) = \exp \left( {\frac{{{\text{fit}}_{{p_{i} }} (t) - {\text{worst(}}t )}}{{{\text{best}}(t) - {\text{worst}}(t)}}} \right)$$
(18)
$$Q_{i} \left( t \right) = \frac{{q_{i} \left( t \right)}}{{\mathop \sum \nolimits_{i = 1}^{N} q_{i} \left( t \right)}}$$
(19)

where fiti is the fitness value of ith particle at any time t; best(t) = min(fiti(t)) and worst(t) = max(fiti(t)).

4 The proposed metaheuristic-optimized meta-ensemble learning model for soil-shear-strength estimation

This section describes the overall structure of the metaheuristic optimized machine learning ensemble, named as MOMEM, used for predicting the soil shear strength. The proposed model consists of the RBFNN and MARS machine learning models. The construct a robust prediction approach, the AEFA metaheuristic is employed to optimize the hyper-parameters of the RBFNN and MARS. Besides, the dynamic weight values for model members involved in the ensemble MOMEM are also automatically determined by the AEFA method. The general workflow of the proposed MOMEM for estimating the shear of soil is presented in Fig. 5.

Fig. 5
figure 5

Procedure for constructing the MOMEM

4.1 Initiative phase

As presented in Fig. 5, the initialization phase uses purely the training data for the construction. Since greater numeric ranges tend to result in undesirable bias in the construction of a machine learning model; therefore, all attribute values are converted into the same range of [0, 1] so that these attributes are fed to the construction process of a machine learning with the equal weight. Herein, the convertion uses the normalization method with the equation presented in Eq. (20). It is worth noticing that the study employed 10-cross validation method to divide the whole dataset. Hence, the training dataset is the result of gathering 9 of 10-crossing folds. The remaining fold is used for testing the trained machine learning model. The training dataset is then partitioned into two sets; one set with 70% of data patterns is employed to trained model; another set with the remaining 30% of data patterns is used to validate the newly trained model:

$$x_{i,j}^{\text{tran}} = \frac{{x_{i,j} - x_{{j,{ \hbox{min} }}} }}{{x_{{j,{ \hbox{max} }}} - x_{{j,{ \hbox{min} }}} }},$$
(20)

where xi,j is a data value of the attribute jth, xj,min is the lowest value for the attribute jth, xj,max is the highest value for the attribute jth, and x trani,j is the transformed value of the attribute jth.

Parameters in the MOMEM are composed of a number of member inference models integrated into and their controlling parameters. Hence, the framework of blending machine learners leads to an increase of parameter numbers along with the increase of involved inference models. In this study, reciprocal merits of the MARS and RBFNN are blended to establish the MOMEM, there are seven parameters (Nn, σ, Mmax, d, Imax, α, and β) that need to be simultaneously fine-tuned to attain the optimal configuration of MOMEM with ultimate generalizability. In this phase, each of all seven parameters initially was assigned many random values in an extensive range to have an evaluation basis for the searching loop at the next phase. The searching boundary for each parameter is set to be sufficiently large to cover all potentially suitable values of concerned parameters and shown in Table 2.

Table 2 Initial control parameter settings

4.2 Searching phase

At steps 1.1 and 1.2, member models will receive values of parameter generated by the AEFA searching engine to proceed with the construction of inference models by using partitioned training data that is processed at the previous phase. This is the construction process of many model candidates and thus inevitably consumes time most compared to other steps. Further, more inference models blended in this meta-ensemble learning framework will entail more computational time. In the MOMEM, particularly, the RBFNN model and MARS model are established independently with AEFA-assigned parameter values (Nn and σ) and (Mmax, Imax and d), respectively. Eventually, the trained models generate the prediction values of all data points for training and testing sets. It should be emphasized that these parameter values purposely attain the greatest prediction accuracy for the proposed meta-learning model, the MOMEM, rather than individual models. It is important to note that many new models are created to potentially replace the old ones at each iteration.

As mentioned previously, the final prediction values of the MOMEM is calculated by summing up the predictive values of blended member models multiplied with their dynamic weight of the corresponding models. Notably, values of dynamic weights (α and β) express the influencing level of the corresponding constituent models on the final prediction values of the MOMEM which are tuned by the AEFA searching engine in the standardized range of [0, 1] as shown in Table 3. Formulas of merging predictive values are expressed in Eq. (21). It is noted that the traditional stacking ensemble method; all member models are assumed to have an equal role, which is expressed by the average weight assigned [56]. These subjective average values may diminish the generalizability of the ensemble model. Further, the AEFA searching engine can automatically discard any member models by assigning a dynamic weight of 0 if it recognizes those models undermine the accuracy of the MOMEM. At that time, the MOMEM becomes a single hybrid machine learner:

$$p_{\text{MOMEM}} = \alpha \times p_{\text{RBFNN}} + \beta *p_{\text{MARS}} ,$$
(21)

where pMNVIM, pRBFNN, and pLSVR indicate the prediction values of the MOMEM, RBFNN, and MARS, respectively; α and β are the dynamic weight of the RBFNN and MARS, respectively.

Table 3 Detailed results of 20 running times

Setting an appropriate objective function for the AEFA searching engineer is crucial to obtain a robust meta-learner successfully. Many joined individual models along with the support of the AEFA are likely to purely minimize the training error that causes the final model to be trapped at the over-fitting problem. It is worth noticing that this well-fitting model has poor generalizability since it is over-trained with a high-complex degree to fit the training data only [57]. Inspired by successes of other works [58,59,60,61], this study put forward an objective function expressed by a sum of validation error and training error, as shown in Fig. 5. The present study prefers to employ the root mean square error (RMSE) for the objective functions because this evaluation criterion has an additional feature of reducing the number of large biases between actual values predictive values. The AEFA searching engine thus needs to reduce the sum of RMSE of training and validation concurrently as shown in Eq. (22). The offered objective function is thus expected to alleviate the effect of over-fitting, increase the generalizability of the MOMEM, and reduce the number of large undesirable biases. Hence the concerned issued may be stroked efficiently:

$${\text{fitness}} = {\text{RMSE}}_{\text{train}} + {\text{RMSE}}_{\text{validation}} .$$
(22)

The set of parameter values (Nn, σ, Mmax, Imax, d, α, and β) will be synchronously adjusted by the transformation procedure of AEA. A bad set of values is replaced by better ones based on the obtained values of the objective function. Therefore, the values of tuning parameters will be progressively improved after each loop. Accordingly, the learnability and generalizability of the MOMEM are progressively improved after each searching loop. The AEFA will select the best tuning parameters, i.e., for each searching loop, those that provide the lowest value of fitness function. The AEFA shall memorize the suitable sets of parameters and extend the searching loop to meeting the stopping criterion. This study used the iteration number of 50 as the stopping criterion.

4.3 Optimal meta-learner and application phase

The searching process halts when the stopping condition is met, indicating that the optimal set of tuning parameters (Nn, σ, Mmax, Imax, d, α, and β) is available to train MOMEM with the entire training dataset. The trained MOMEM is then saved to perform prediction tasks for the testing phase or generate prediction value for new data input. It is recommended to update and re-train when having enough number of new data. More data of soil-shear-strength enable MOMEM to further discover more underline features of soil-shear-strength thus reaching closer to the underlying function of soil’s shear strength.

5 Experimental results and discussions

5.1 Performance evaluation criteria

The performance of an AI-based inference model should be evaluated and compared based on many criteria that are required to cover different aspects comprehensively. This ensures the conclusion of the assessment to be reliable. This study uses root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and determination of coefficient (R2) to benchmark performance of the MOMEM against other models.

In detail, MAPE displays the model accuracy in the type of percentage that spots the high errors of the small actual values. RMSE highlights the undesirably sizable biases among actual values and predictive values, while MAE gives an average of the prediction bias with equal weight to all errors. The greater models will attain smaller values of RMSE and MAE. Meanwhile, R2 indicates the capability of the model in reasonably inferring future outcomes with the highest value of 1. Formulae of RMSE, MAPE, MAE, and R2 are expressed as Eqs. (23)–(26):

$${\text{RMSE}} = \sqrt {\left( {1/n} \right) \times \mathop \sum \limits_{i = 1}^{n} [p_{i} - y_{i} ]^{2} }$$
(23)
$${\text{MAE}} = \left( {1/n} \right) \times \mathop \sum \limits_{i = 1}^{n} \left( {\left| {p_{i} - y_{i} } \right|} \right)$$
(24)
$${\text{MAPE}} = \left( {1/n} \right)\mathop \sum \limits_{i = 1}^{n} \left( {\left| {p_{i} - y_{i} } \right|/y_{i} } \right) \times 100$$
(25)
$$R^{2} = {\text{SSR}}/{\text{SST}} = 1 - \mathop \sum \limits_{i = 1}^{n} (y_{i} - p_{i} )^{2} /\mathop \sum \limits_{i = 1}^{n} (y_{i} - \bar{y})^{2}$$
(26)

where SST= total of sum square errors; SSR = sum of square residual errors; pi = predicted value; yi = actual value; \(\bar{y}\) = average of actual value; and n = number of data patterns.

5.2 Experimental results and discussion

In order to avoid the bias in the data partition process, a tenfold cross-validation approach is applied for the splitting dataset [62]. Thus, all data patterns will be in turn assigned in both training and testing sets. Since this study compares AI-inference models based on statistical methods with mean and standard deviation, all models will perform the prediction task on 20 run times. Hence, the ten crossing folds are used twice. The models were run in MATLAB environment version 2018a [63].

Table 3 displays the performance of the MOMEM on the soil-shear-strength for 20 run times. As seen in the table, the MOMEM attains a very high accuracy for inferring values of soil-shear-strength which is expressed by the relatively low average values of RMSE (0.035 kg/cm2), MAPE (7.94%), and MAE (0.026 kg/cm2) in the testing phase. Further, the low values of the standard deviation of those criteria strongly confirm that the high performance of MOMEM is retained stable and exiguously affected by the data partition. Interestingly, there are a slight difference between the values of evaluation criteria between the testing phase and the training phase. Particularly, the absolute deviations of MAPE and RMSE at the two phases are only 0.25% and 0.003 (kg/cm2), respectively. These facts indicate that the over-fitting issue is addressed efficiently in the MOMEM by summarizing the training and validation errors in the objective function.

Sets of parameter values found on 20 run times are shown in Table 4. Surprisingly, the values of all parameters vary in large ranges. Especially, the optimal spread of Gaussian function in RBFNN lies from 0.018 to 4.979 while the maximum number of neurons is 60 that is as many as ten times of minimum number, respectively. Those changes are obviously interpreted as to grasp the different characters of each training and validation dataset. Apparently, the chance of the trial-and-error method or experience-based parameter setting to successfully determine the optimal configuration of MOMEM is limited.

Table 4 The obtained control parameters

As exhibited in Fig. 6, there is a change of the member model’s role in the MOMEM which is identified by the values of α and β found in a range of [0 0.997]. Especially, the RBFNN absolutely dominated the MARS in the construction of MOMEM in the 16th run when α and β are found as 0.997 and 0, respectively. Inversely, the RBFNN was discarded in the final MOMEM in the 12th run and the 17th run. The values of dynamic weights assigned to be 0 dedicated that the corresponding constituent model (RBFNN or MARS) was recognized to impair the accuracy of the MOMEM model; thus, the final prediction values of the MOMEM were produced by the only one of two constituent models (RBFNN or MARS). Accordingly, the MOMEM becomes a pure hybrid of the AEFA with the RBFNN or with the MARS, abbreviated as AEFA-RBFNN and AEFA-MARS, respectively.

Fig. 6
figure 6

The weights of member models in the proposed MOMEM

It is worth noticing that found values of parameters lie in the middle of pre-defined ranges, indicating the boundary of each range was sufficient to cover all the potential values of parameters. Figure 7 exhibits the convergence curves of the particular run, which demonstrated the AEFA to have a rapid convergence. In detail, the optimal values were able to be found at approximately iteration 35 out of 50, which affirmed that the fittest values are always determined before the stopping condition is met.

Fig. 7
figure 7

Typical convergence curves

5.3 Result comparison and discussion

It is necessary to compare the performance of the developed models against that of other AI-based inference models so that the role of techniques in MOMEM is clearly clarified. RBFNN, MARS, generalized regression neural network (GRNN), and support vector regression with Gaussian function (kernel function) (SVRgau) are selected because they present a different type of machine learner and highly perceived in solving engineering-related issues in the literature. AEFA-MARS and AEFA-RBFNN, and traditional MARS + RBFNN ensemble models are selected in the comparison to quantify the contribution of each technique integrated into MOMEM. All models were run on MATLAB environment 2018a [63] with the trial-and-error method to find the potentially suitable configuration. The interface platforms are presented in Fig. 8.

Fig. 8
figure 8

Interface platform for performing shear strength of soil on MATLAB GUIDE

The statistical results shown in Table 5 have demonstrated the proposed model, MOMEM, as the greatest model by achieving the greatest values in terms of RMSE (0.035 kg/cm2), MAPE (7.94%), MAE (0.026 kg/cm2). Those values are much lower than that of the second-best model, the AEFA-RBFNN, which attains values of RMSE, MAPE, MAE of 0.039 kg/cm2, 9.05%, 0.029 kg/cm2, respectively, in the testing phase. As measured, there are at least 11.4 and 11.5% improvement in terms of RMSE and MAE, respectively, when using MOMEM to predict soil-shear-strength rather than the use of the AEFA-RBFNN and the AEFA-MARS (third-best model). Notably, in machine learning, the generalizability of a model is the first priority since it presents the ability of the model to infer new facts. It is thus acceptable as MOMEM obtained higher values of RMSE (0032 kg/cm2), MAE (0.026 kg/cm2), MAPE (7.69%) than the AEFA-RBFNN and the AEFA-MARS in the training phase.

Table 5 Shear strength prediction performance of the MOMEM

In comparison with traditional the MARS + RBFNN ensemble model, MOMEM expresses as a dominant model because it reduces values of RMSE and MAE up to 16 and 23%, respectively. These numbers affirmed that the AEFA searching engine and re-defined dynamic weights already fulfilled the gap of performance of the traditional ensemble method. The statistical results revealed that using conventional single machine learners, including the MARS, RBFNN, GRNN, and SVRgau, models are not efficient in diminishing of bias in predicting soil-shear-strength.

A cause-and-effect relationship between the soil’s shear strength and its attributes will lead to a high value of R2, meaning that the MOMEM mapped a closer underlying function of soil’s shear strength by gaining the greatest value of R2 (0.864 and 0.826 for training and testing phase, respectively). In other words, 82.6% of data patterns can be inferred by MOMEM-mapped function while it is only 77.7% for the second-best model, the AEFA-RBFNN. Figure 9 presents the actual-versus-prediction value of all data patterns in the testing phase. Generally, the obtained results have firmly proved a possibility of boosting the accuracy of soil-shear-strength estimate by integrating the AEFA in the fusion of the MARS and the RBFNN.

Fig. 9
figure 9

Actual-versus-prediction values: a MOMEM, b AEFA-MARS, c AEFA-RBFNN, d MARS + RBFNN

5.4 One-tail t test method for examining the mean difference

This study further conducted a one-tailed t test to test for the significant differences of the MOMEM’s performance in estimating shear strength of soil against that of other AI models. The one-tailed t test was calculated on the root mean absolute error (RMSE) values with an equal size of 20 samples (run times) and unknown variances. It is worth noting that the RMSE sample is assumed to have a normal distribution. The procedure for implementing one-tailed t test is as follows:

  • \(H_{0}{:}\quad {\text{MAPE}}_{\text{MOMEM}} - {\text{MAPE}}_{\text{others}} = 0\)

  • \(H_{1}{:}\quad {\text{MAPE}}_{\text{MOMEM}} - {\text{MAPE}}_{\text{others}} < 0\)

$${\text{Degree of freedom:}}\quad v = (n - 1)(s_{1}^{2} + s_{2}^{2} )^{2} /(s_{1}^{4} + s_{2}^{4} )$$
(27)
$${\text{Statistical test:}}\quad t = \sqrt n \left( {\bar{x}_{1} - \bar{x}_{2} } \right)/\sqrt {s_{1}^{2} + s_{2}^{2} } ,$$
(28)

where n is the number of samples (n = 20); ν is the degree of freedom; s 21 and s 22 are the unbiased estimators of the variances of the two samples; the denominator of t is the standard error of the difference between two means \(\bar{x}_{1}\) and \(\bar{x}_{2}\) (average).

Calculated results with a confidence level of 95% (α = 0.05) are presented in Table 6. For all cases, t statistic < − t-critical one-tailed (− 1.69), indicating MOMEM significantly outperformed other models in reducing RMSE values of soil-shear-strength. This conclusion may be manifested in Fig. 10. In conjunction with a robust performance demonstrated as analyzed above, the MOMEM has no request for setting up its configuration in the soil-shear-strength estimate process. In general, Analysis results support the MOMEM as the best choice for civil engineers; this model is able to yield reliable estimate values of soil’s shear strength.

Table 6 Hypothesis testing for comparing models’ performance
Fig. 10
figure 10

Box-plot of top comparative models’ evaluation criteria

In order to achieve efficiency, the construction of the MOMEM using the hybrid of several AI techniques inevitably increases the complexity of the model structure. Thus, the MOMEM needs more computation time than other models in the model construction process. Considering the many obtained benefits, including significantly improved prediction accuracy of soil-shear-strength and user-friendliness, the additional computation time is justified and acceptable.

6 Concluding remarks

The shear strength is the maximum resistance or stress that a particular soil can offer against failure over its improper surface loading. The shear strength of soils is essential for any stability analysis. Therefore, it is crucial to determine the reliable values of soil-shear-strength. A novel meta-learner was created based on the framework, called Metaheuristic-optimized meta-ensemble learning model (MOMEM) which was a combination of the RBFNN, the MARS, and the AEFA search engine. In the MOMEM, AEFA applies its optimization procedure to organize suitable configuration of the inference model member, including RBFNN (Nn and σ) and MARS (Mmax, Imax, and d) through adjusting model’s control parameters. Simultaneously, AEFA finetunes the model weights of RBFNN and MARS (α and β) to eventually generate prediction values of the MOMEM which is calculated by the sum of member models predictive values multiplied with their corresponding model weights.

The performance of the MOMEM is validated based on 240 data patterns of soil-shear-strength that were collected during the site investigation in the Le Trong Tan Project, Hanoi, Vietnam. The statistical results of 20 run times based on a tenfold cross-validation technique display that MOMEM is the best model in predicting shear strength of soil by attaining greatest values in terms of RMSE (mean = 0.035 kg/cm2; Std. = 0.005 kg/cm2), MAE (mean = 0.026 kg/cm2; Std. = 0.004 kg/cm2), MAPE (mean = 7.9%; Std. = 1.72%), and R2 (mean = 0.826; Std. = 0.055). The one-tail t test endorsed that the obtained results are significantly better than that of other comparative AI models, including variants of the RBFNN, the MARS, the conventional RBFNN + MARS ensemble, and the SVReg.

The findings demonstrated that the AEFA searching engine can autonomously determine the best set of parameter values to adaptively exploit features of each training dataset. The AEFA is able to recognize and discard a model member that damages the accuracy of the MEMOM. At that time, the MEMOM becomes a purely single hybrid model of a metaheuristic algorithm and a machine learner. In summary, the AEFA plays an integral role in selecting the merits of RBFNN and MARS to make them the best fit in the MOMEM.

This study is the first to put forward a novel stacking techniques-based ensemble model of hybridizing MARS and RBFNN that represent different types of learners in the field of machine learning for sharing reciprocal merits in the meta-ensemble model. Additionally, the present study lifted the operation of ensemble model to a higher level by integrating a robust metaheuristic optimization algorithm, AEFA, to ascertain automatically attaining the maximum performance of the model combination. With the success of employing MOMEM for addressing the soil-shear-strength prediction problem, this study further contributes a novel framework of establishing metaheuristic stacking technique-based ensemble model which is expected to inspire scholars to create new models for solving other practical problems.

Despite possessing many advantages, the MOMEM also has two weaknesses: First, the MOMEM performs the inference process as a black box since the mapped functions of soil-shear-strength is not visually formulated. Second, the model construction needs a long time due to the optimization process and two inference models involved in the trained process. However, the computational time problem can be sharply shortened by using modern high-speed computers. Further, once the optimal model is found, it can quickly infer the outcome of new data patterns and saved for use in a long time. The saved model should be updated with new data points collected which is also a work for authors to implement in the future.