1 Introduction

China is rich in coal resources and has a high dependence on coal in production and life for a long time. Coal mine gas is a main factor that affects coal mine safety production, and the prediction of gas emission in the working face is the main basis for determining gas emission grade in mine or horizontal mining. (Zhou et al. 2020a, b; Gao et al. 2020; Long et al. 2021). In 1964, Lindine (Shanjun 1998) established the first empirical model for predicting gas emission in coal mine. Since then, mine statistics method, separate-source prediction method, and gas geological map method were gradually applied in gas emission prediction (Zhang and Zhang 2005; Dai et al 2007). However, this type of prediction method does not consider the gas emission and its migration as a dynamic nonlinear system. For decades, the prediction technology of gas emissions from underground coal mining has been the subject of extensive research. The technology ranges from simple geometric models to modern finite element models (Wang et al. 2015; Guo et al. 2020; Liu et al. 2021). Researchers have adopted experiments and numerical simulations to study the occurrence of coal seam gas. In addition, to predict the gas emission rate of the longwall working face, a numerical gas emission model was established on the basis of ventilation pressure and the flow survey of the entire mine (p-Q survey) (Karacan 2008; Guo et al. 2012). The mathematical method of gas geology based on case analysis (Zhang and Yuan 1999; Zhang et al. 2009) has been developed rapidly. This method mainly uses machine learning algorithms and data mining techniques to establish a predictive model, these techniques can consider the dynamic changes of multiple factors by analyzing real-time data of gas emission. Scholars used methods based on statistics, principal component analysis (PCA), and artificial neural networks (ANN) to predict the ventilation gas emission rate of longwall mines in the United States (Karacan and Goodman 2012; Karacan and Olea 2014).

Recently, researchers have given increasing attention on the parameter selection and model establishment of gas emission prediction (typical reference was summarized in Table 1).

Table 1 Typical reference focusing on gas emission influencing factors

Table 1 shows that common factors include coal seam thickness, buried depth, dip angle, gas content in coal seam, floor elevation of the coal seam, spacing between adjacent layers, thickness of adjacent layers of the coal seam, daily output, daily advancing distance, and pure amount of gas extraction. Most previous research on gas emission prediction only focuses on single parameter combination or single prediction algorithm. The accuracy, generalization, and reliability of the gas emission prediction method based on case analysis mainly depend on the influencing factors of gas emission and the selected algorithm. Consequently, the limitations of the prediction model must be extinguished, and various feature combinations should be effectively matched with different machine learning algorithms.

The foothold of this work was to propose a new gas emission prediction method. For a series of gas emission influencing factors, the feature selection method was used to form different gas emission factor combinations, and various machine learning algorithms were applied to traverse all the gas emission factors. The combination of factors and the machine learning algorithm were selected. This new prediction method avoids the limitations of using single combination of factors and single machine learning algorithm in previously published papers.

The new method contains multiple characteristic parameters, algorithms, combinations, and judgment indicators. Pearson correlation coefficient, full subset regression, recursive feature elimination (RFE), and random forest (RF) were applied to determine the optimal combination of gas emission feature parameters. Gaussian process regression (GPR), support vector machine (SVM), least squares SVM (LS-SVM), gradient boosted regression tree (GBRT), random forest (RF), multilayer perceptron (MLP), BP neural network (BPNN), and Elman neural network (ENN) were applied to construct dynamic prediction model with a multi-parameter combination. Normalized mean square error (NMSE), mean absolute percentage error (MAPE), Theil IC (TIC), and judgment coefficient (R2) were applied to evaluate the accuracy of the model comprehensively. The new technique can provide a basis for the accurate prediction of gas emission.

2 Data processing

2.1 Data instance acquisition

The influencing factors of this paper can be divided into geological and mining factors, which are called first indicators. The secondary indicators that characterize geological factors include coal seam thickness (M), buried depth (H), dip angle (D), gas content in coal seam (GC), floor elevation of the coal seam (BLV), spacing between adjacent layers (SD), and thickness of adjacent layers (ML) of the coal seam. The factors that characterize mining include the daily output (DO), daily advancing distance (V), and pure amount of gas extraction (EP) of the working face. The predicted data were derived from Ma (2017) and Yan (2020).

2.2 Analysis of acquired data

A total of 60 groups of statistical parameters are shown in Fig. 1. To improve the generalization ability of the model and prevent the model from overfitting, the data set was shuffled randomly. The data set is divided into training set (40 sets of data) and verification set (20 sets of data), and the ratio was 2:1. The training set is used for model training, whereas the verification set is used to verify and evaluate the reliability and generalization performance of the trained model.

Fig. 1
figure 1

Box plot of various gas emission parameters. Notes: The upper and lower data represent the maximum and minimum values of each parameter, respectively, and the red data represent the average values

2.3 Data standardization

The 10 input parameters selected in the gas emission data set were all numerical data, and the value ranges of the different parameters varied and may even have diverse orders of magnitude. To obtain accurate prediction results and ensure that each parameter plays a role, Z-score standardization was performed on the parameters to reduce the influence of parameter scale on the model.

The sequence x1, x2, …, xn is transformed:

$$\overline{x} = \frac{1}{n}\sum\limits_{i = 1}^{n} {x_{i} } $$
(1)
$$s = \sqrt {\frac{1}{n - 1}\sum\limits_{i = 1}^{n} {(x_{i} - \overline{x} )^{2} } }$$
(2)
$$h_{i} = \frac{{x_{i} - \overline{x} }}{s}$$
(3)

where xi is the original sequence, i ∈ [1, n]; \(\overline{x}\) is the average value of the sequence, s is the standard deviation, and hi is the new sequence after transformation, i ∈ [1, n].

3 Gas emission prediction model establishment process and primary selection

3.1 Establishment of the prediction model

  1. (1)

    Sample data processing. The data set is standardized by Z-score method.

  2. (2)

    Combination selection of feature parameters and determination of algorithm hyper-parameters.

    The training set was divided into five parts, and then five-fold grid search cross-validation processes were performed, each time a different part is used as the validation set, and the four remaining parts were combined as the training set. Each sample was used as a validation sample in one experiment and a training sample in four experiments to obtain the optimal parameters for the algorithm with the highest accuracy.

    In the grid search process, a series of priori candidate values of the algorithm-related parameters was given first, and all parameter value combinations were tested through loop traversal, and then the parameter value combinations that enable the algorithm to perform optimally were obtained.

  3. (3)

    Establishment of the prediction model. Different supervised algorithms and characteristic parameters were used to establish the gas emission prediction model.

  4. (4)

    Primary selection of the collaborative model. By analyzing the verification set data, the algorithm and the parameter combination with the average judgment coefficient R2 greater than 0.80 is selected. Thus, the prediction cooperation model was selected preliminarily.

  5. (5)

    Collaborative model optimization. In the above prediction models, the prediction model with the sum of MAPE and TIC less than 0.1 was selected (Ashis et al. 2013; Jin et al 2020, Seçkin et al. 2020), then the prediction model with the maximum relative error (REmax) less than 15% and mean relative error (MRE) less than 10% was chosen as the optimization cooperation model.

  6. (6)

    Collaborative model prediction. The predicted value (\(\hat{y}_{i}\)) was obtained after averaging the predicted data of each group of the optimized collaborative model.

The prediction flow of gas emission in the working face is shown in Fig. 2.

Fig. 2
figure 2

Establish flow chart of prediction model

3.2 Primary selection of forecasting model

3.2.1 Feature combination selection

Feature selection refers to the selection of a feature subset according to the importance in a feature set. Few variables will lead to the low accuracy of the model, and excessive parameters cannot necessarily increase the accuracy of the model but lead to over-fitting problem. Furthermore, different feature combinations have diverse sensitivities to various machine learning algorithms. Therefore, the main function of the feature selection is to strengthen the generalization ability of the model, reduce over-fitting, and enhance the understanding between features and eigenvalues. Generally, the feature selection methods can be divided into three categories: direct method, univariate feature selection, and multivariate feature selection. In this paper, the Pearson correlation coefficient method, full subset regression, RFE, and RF were used to obtain the best input variable combination (Fig. 3).

Fig. 3
figure 3

Results of feature selection methods. Notes: The black box in c represents the selected parameter, and the blue dotted line represents the division of different parameter combinations, a total of 17 groups

Figure 3a shows the correlation analysis by using the Pearson correlation coefficient method (Dominic et al. 2020). Pearson correlation coefficient was used for measuring the correlation between N and M. Its value is between − 1 and 1. Pearson correlation coefficient can be expressed as:

$$r = \frac{{\sum\limits_{i = 1}^{n} {(N_{i} - \overline{N} )} (M_{i} - \overline{M} )}}{{\sqrt {\sum\limits_{i = 1}^{n} {(N_{i} - \overline{N} )}^{2} } \sqrt {\sum\limits_{i = 1}^{n} {(M_{i} - \overline{M} )}^{2} } }}$$
(4)

where N and M represent two pairs of continuous variables.

According to Eq. (4) and Pearson correlation coefficient classification rules, the absolute value of Pearson correlation coefficient that is greater than 0.4 is regarded as moderate correlation. In this example, the variable above moderately correlated is considered the input variable, and the dashed box represents the correlation degree between gas emission and each parameter.

RFE is a wrapper feature selection method, in which the search starting point is all features, and the evaluation criterion is the mean square error of each grouping. After cyclic iteration, each iteration eliminated the least relevant feature. The combination with the smallest mean square error is the optimal feature subset (You et al. 2014; Ke et al. 2015) (Fig. 3b). In Fig. 3b, the abscissa represents the number of features, whereas the ordinate represents the mean square error of a specific group. When the number of features were 10 (all features), the mean square error was the smallest.

Full subset screening was based on all possible combinations of different independent variables. The reduced variable combinations were fitted by the least square method, and a model with a corrected coefficient of determination greater than 0.9 was selected among all the possible models (Zhang et al. 2019a, b). The selection result was shown in Fig. 3c. In this example, 17 optimal combinations were obtained through full subset screening, and the determination coefficients of these 17 combinations were all greater than 0.9.

A large number of decision trees was used for the feature selection in RF (Speiser et al. 2019), and the variables obtained from each decision tree were synthesized to obtain the final variable importance ranking (Fig. 3d). In this example, according to the RMSE and square sum of residuals, nine factors, except for the buried depth of coal seam, are selected.

In summary, the Pearson correlation coefficient, RFE, full subset regression, and RF were used to select 10 influencing factors according to different laws. A total of 20 sets of feature combinations were obtained (Fig. 4).

Fig. 4
figure 4

Combination set of feature parameters affecting gas emission. Notes: F-1 represents the first feature combination, F-2 represents the second feature combination, and so on. The feature combination of the same color is selected by the corresponding feature selection method on the left

3.2.2 Selection of prediction algorithm

  1. (1)

    Regression algorithm

    GPR (Mahmoodzadeh et al. 2021; Noori et al. 2019) has good adaptability and strong generalization ability to address high-dimensional, small-sample, nonlinear, and complex problems. Compared with neural network and SVM, this method has the advantages of easy implementation and adaptive acquisition of super-parameters. SVM (Qian et al. 2014; Zhou et al. 2012) has shown many unique advantages in solving small sample, nonlinear, and high-dimensional pattern recognition problems. The ultimate goal of the LS-SVM (Xue and Xiao 2017) optimization problem is to obtain the optimized model parameters. The linear decision function constructed by LS-SVM not only has good fitting performance but also has strong generalization ability.

  2. (2)

    Neural network

    Multilayer is an essential feature of MLP (Teresa and Wilson 2013) that includes an input layer, an output layer, and a hidden layer. No specific number of hidden layers is provided. Thus, the appropriate number of hidden layers can be selected according to the requirements. The number of neurons in the output layer are unlimited. BPNN (Zhang et al. 2019a, b; Zhao et al. 2021) is a multi-layer perceptron network trained according to error back propagation and consists of an input layer, at least one hidden layer, and an output layer. ENN (Xie et al. 2019) is a kind of dynamic feedback network that not only has an input layer, a hidden layer, and an output layer unit but also has a special connection unit. The special connection unit can be regarded as a time delay method that enables the network to have the function of dynamic memory.

  3. (3)

    Integrated learning

    In ensemble learning, a series of learners is used, and a certain rule is adopted to integrate various learning results to obtain significantly better generalization performance than a single learner. In this paper, in addition to ensemble learning, six single machine learning algorithms were also proposed to compare the ensemble algorithm and a single algorithm and adopt more comprehensive methods to establish a gas emission prediction model. The main methods in ensemble learning include boosting and bagging, and the combination rules of the two differ.

The main idea of boosting ensemble learning is to assemble diverse weak classifiers into a strong classifier and then combine them linearly through additive model. GBRT (Zhou et al. 2020a, b; Persson et al. 2017) is a kind of boosting, and each calculation reduces the last residual error and builds a new model. In another integrated learning method called bagging, no strong dependence is observed among individual learners. RF (Lu et al. 2016) refers to an evolutionary version of the bagging algorithm. In the randomly selected sample features, an optimal feature is selected to divide the left and right subtrees of the decision tree and further enhance the generalization ability of the model.

Through the 20 feature combinations in Fig. 3 and eight different supervised learning algorithms, 160 kinds of gas emission prediction models in the working face are constructed. These prediction models are used to verify 20 groups of data in the verification set randomly, and the R2 is shown in Table 2. R2 is calculated using Eq. (5) as follows.

$${R}^{2} = 1 - \frac{{\sum\nolimits_{i = 1}^{n} {(y_{i} - \widehat{y}_{i} )^{2} } }}{{\sum\nolimits_{i = 1}^{n} {(y_{i} - \overline{y}_{i} )^{2} } }}$$
(5)

where, yi is the true value, and i ∈ [1, n]; \(\hat{y}_{i}\) is the predicted value, i ∈ [1, n].

Table 2 R2 of various algorithms using different parameter combinations

The R2 of the prediction model ranges from 0.255 to 0.999, among which the average judgment coefficient of LS-SVM (0.936), GBRT (0.932), MLP (0.901), and RF (0.803) are all greater than 0.800. The LS-SVM has low dependence on feature combination (the range of R2 is 0.899–0.986), followed by GBRT (the range of R2 is 0.639–0.969) and RF (the range of R2 is 0.607–0.966), whereas the MLP fluctuates greatly (the range of R2 is 0.255–0.999). Except for the first four algorithms, the average judgment coefficient of the other algorithms is less than 0.800, among which BPNN has the largest fluctuation, with R2 ranging from 0.581 to 0.917 (Table 2).

4 Optimization and verification of gas emission prediction model

4.1 Gas emission prediction model optimization

4.1.1 Determination of optimal prediction algorithm and feature combination

The average judgment coefficients of seven feature parameter combinations, such as F-3, F-9, F-11, F-12, F-13, F-14, and F-20, under various algorithms are all greater than 0.800. The NMSE (Das et al. 2020), MAPE, and TIC of 28 types of prediction models under the four algorithms and seven feature parameter combinations are calculated. The calculation is shown in Eqs. (6) to (8), and the results are shown in Table 2 and Fig. 5.

$${\text{NMSE}} = \frac{{\sum\nolimits_{i = 1}^{n} {\left( {y_{i} - \hat{y}_{i} } \right)^{2} } }}{{\sum\nolimits_{i = 1}^{n} {\left( {y_{i} - \overline{y}_{i} } \right)^{2} } }}$$
(6)
$${\text{MAPE}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {\frac{{y_{i} - \hat{y}_{i} }}{{y_{i} }}} \right|}$$
(7)
$${\text{TIC}} = \frac{{\sqrt {\frac{1}{n}\sum\nolimits_{i = 1}^{n} {\left( {\hat{y}_{i} - y_{i} } \right)^{2} } } }}{{\sqrt {\frac{1}{n}\sum\nolimits_{i = 1}^{n} {\hat{y}_{i}^{2} } } + \sqrt {\frac{1}{n}\sum\nolimits_{i = 1}^{n} {y_{i}^{2} } } }}$$
(8)
Fig. 5
figure 5

Evaluation index of various algorithms using different parameter combinations

The values of NMSE, MAPE, and TIC of the LS-SVM are less than 0.1, 0.040–0.084, and 0.024–0.063, respectively. The values of NMSE, MAPE, and TIC of GBRT are 0.031–0.308, 0.049–0.166, and 0.037–0.133, respectively. The values of NMSE, MAPE, and TIC of RF are 0.034–0.393, 0.016–0.125, and 0.037–0.114, respectively. The values of NMSE, MAPE, and TIC of MLP are 0.001–0.085, 0.004–0.077, and 0.002–0.060, respectively. Overall, except for the low accuracy of RF, the prediction results of LS-SVM, GBRT, and MLP are ideal regardless of the combination of accuracy and volatility (Fig. 5).

4.1.2 Determination of the optimal collaborative forecasting model

MAPE and TIC have similar meanings, and the changes in MAPE and TIC are considered comprehensively. The MAPE + TIC value of the green area, where the MLP is located, is mostly less than 0.1, followed by the LS-SVMR and the GBRT. Finally, 13 prediction models are selected, and the evaluation indexes of the prediction results of each model are shown in Fig. 6.

Fig. 6
figure 6

Data distribution of added MAPE and TIC

To ensure the stability of prediction sequence, the prediction models with the maximum relative error (REmax) less than 15% and mean relative error (MRE) less than 10% are selected as the optimal collaborative prediction models. The optimal collaborative prediction models are LS-SVM and F-20, GBRT and F-11, MLP and F-3, F-9, F-11, F-12, F-13, F-20 (Table 3).

Table 3 Evaluation indexes of model prediction results

4.2 Verification of optimal collaborative forecasting model

The average predicted data value of these eight optimized collaborative models is taken as the final predicted value. The predicted evaluation indexes are shown in Fig. 7 and Table 4. All the evaluation indexes of gas emission prediction results meet the requirements by optimizing the collaborative model. The absolute error (AE) and mean relative error (MRE) are calculated by Eqs. (9) and (10), respectively.

$${\text{AE}}={y_{i}-\widehat{y_{i}}}$$
(9)
$${\text{MRE}} = \frac{1}{n}\left| {\left( {\frac{{y_{i} - \hat{y}_{i} }}{{y_{i} }}} \right)} \right|$$
(10)
Fig. 7
figure 7

Comparison of predicted value and original value

Table 4 Optimization model evaluation indicators

The maximum relative error (REmax), the minimum relative error (REmin), and the MRE of the predicted sequence in this paper are better than those in Ma (2017), Yan (2020), Wang et al. (2018), Jing et al. (2011) (Table 5).

Table 5 Comparison of optimization models

5 Conclusions

The use of data mining techniques is of great significance to analyze the rules between parameter combination and machine learning algorithm for the prediction of coal mine gas emission. Through the selection of feature parameter combination, establishment of prediction model, selection of collaborative model, and verification of the model, the latter realizes gas emission prediction under multiple characteristic parameters, algorithms, combinations, and judgment indicators. The main conclusions are presented as follows:

  1. (1)

    A total of 20 combination sets of characteristic parameters of influencing factors of gas emission are established in the working face; one parameter combination is obtained by Pearson correlation coefficient method, full subset regression, and RF; and 17 parameter combinations are determined by recursive feature elimination.

  2. (2)

    The R2 of 160 kinds of gas emission prediction models with different combinations of algorithms and feature parameters are 0.255–0.999. Four algorithms, namely, LS-SVM, GBRT, RF, and MLP, have average judgment coefficients that are more significant than 0.800.

  3. (3)

    Eight cooperative models, LS-SVM and F-20, GBRT and F-11, MLP and F-3, F-9, F-11, F-12, F-13, and F-20, can be used for predicting and optimizing gas emission in the working face. The evaluation indexes of the final predicted value and the original value all meet the requirements.

A new gas emission prediction concept is proposed in this paper. Multi-parameter combination and multi-machine learning algorithm form a multi-prediction model group. In the future, based on the proposed collaborative prediction model of gas emission in the working face, this concept can be further verified by more gas emission influencing factors, algorithms, and sample data sets to optimize the prediction model further.