Abstract

The accurate, efficient, and reliable forecasting of wind speed is a hot research topic in wind power generation and integration. However, available forecasting models focus on forecasting the wind speed using historical wind speed data and ignore multidimensional meteorological variables. The objective is to develop a hybrid model with multidimensional meteorological variables for forecasting the wind speed accurately. The complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is applied to handle the nonlinearity of the wind speed. Then, the original wind speed will be decomposed into a series of intrinsic model functions with specified numbers of frequencies. A quadratic model that considers the two-way interactions between factors is used to pursue accurate forecasting. To reduce the model complexity, Gram–Schmidt-based feature selection (GSFS) is applied to extract the important meteorological factors. Finally, all the forecasting values of IMFs will be summed by assigning weights that are carefully determined by the whale optimization algorithm (WOA). The proposed forecasting approach has been applied on six datasets that were collected in Qinghai province and is compared with several state-of-the-art wind speed forecasting models. The forecasting results demonstrate that the proposed model can represent the nonlinearity of the wind speed and deliver better results than the competitors.

1. Introduction

The depletion of conventional fossil fuels and the deterioration of the ecological environment have been receiving increasing concern in recent years. It is inevitable that fossil fuels will be restricted or replaced by renewable energy resources, which include wind, solar, geothermal, and biomass ones [1, 2]. As a type of clean, inexhaustible, and environmentally preferable renewable energy, wind energy has been developed rapidly since the 1990s around the world and has received widespread attention [3]. Based on the annual report by the Global Wind Energy Council, the global market was 52.573 GW of the wind power installed capacity in 2017, thereby resulting in a cumulative installed capacity of 539.581 GW [4]. Figure 1 presents the top 10 countries in the world in terms of wind power installed capacity from January to December 2017. The wind power is a function of the cube of the wind speed; hence, small differences in the wind speed forecast cause large differences in the wind power forecast [5]. Moreover, the wind speed is an important variable that is related to the wind turbine output power, and the accurate forecast of wind speed facilitates the improvement of the efficiency of wind turbines and the enhancement of the stability of the power supply [6]. However, the wind speed exhibits the stochastic fluctuations and irregular intermittency due to the influence of several meteorological factors, and its distribution is not normal. Therefore, it is also critically challenging to forecast the wind speed accurately. Many scientific techniques have been developed for forecasting the wind speed. These algorithms can be divided into five types: physical-based methods, statistical-based methods, persistence methods, intelligent methods, and hybrid methods [7]. Physical-based methods, which are suitable for long-term wind speed forecasts, apply parametric models according to the physical changes in the atmosphere [8], such as the numerical weather prediction (NWP) model [9]. Statistical-based methods, such as statistical regression [10], the autoregressive moving average (ARMA) models [11], Bayesian models [12], and Markov models [13], depend on the historical wind speed data and forecast the future pattern [6]. These models are suitable for forecasting short-term wind speed but have difficulties if the wind speed series exhibit high-frequency variations. Persistence methods, which are used in short-term wind speed forecasts, are based on the assumption that the wind speed at time k+1 will not change significantly from the wind speed at time k [14]. The intelligent methods apply machine learning and artificial intelligence algorithms to forecast the wind speed and include Kalman filtering [15], artificial neural networks (ANNs) [16], support vector machine [17], extreme learning machine (ELM) [18], and high-dimensional feature selection [19]. The most popular intelligent methods for forecasting wind speed are ANNs. Zhang et al. proposed a new hybrid model that combines a shared-weight long short-term memory network (SWLSTM) and Gaussian process regression (GPR), and they applied SWLSTM-GPR to wind speed prediction cases [20]. Li et al. developed an innovative framework for multistep wind speed prediction that uses a recursive neural network that is based on the wind speed and the turbulence intensity. According to experimental results, the proposed model dramatically outperforms conventional machine learning models on multistep wind speed prediction [21].

Currently, the main focus has shifted from single methods to advanced hybrid methods, which combine techniques such as statistical and machine learning approaches and utilize the characteristics of individual methods. Sheela and Deepa used a hybrid computing model that integrates a multilayer perceptron network and self-organizing feature maps for wind speed prediction [2224]. Shukur and Lee proposed a hybrid model that was based on ARIMA for forecasting the daily wind speed in Iraq and Malaysia; to improve the forecasting accuracy, an ANN and a Kalman filter (KF) could be utilized to overcome uncertainty and nonlinearity problems [25]. Doucoure et al. used wavelet decomposition and ANNs with predictability analysis to forecast wind speed time series [26]. Liu et al. developed a novel wind speed prediction model that was based on the WPD (wavelet packet decomposition), a convolutional neural network and a convolutional long short-term memory network, and they demonstrated that the proposed model can perform best in wind speed 1-step to 3-step predictions [27]. Zhu et al. proposed a wind speed forecasting model by integrating Gaussian process regression with a long short-term memory network, and the results demonstrated that the proposed method improves the forecast accuracy [28]. Cai et al. proposed a novel filtering strategy that integrates the statistical predictors and the NWP model outputs into a unified framework. Based on the proposed filtering strategy, a combined predictor, namely, SVR-SDA-UKF (support vector regression, stacked denoising autoencoder, and unscented Kalman filter), is proposed and validated [29].

In addition, wind speed time series exhibit nonlinearity and nonstationarity due to the weather and geographical factors. This may lead to incorrect conclusions regarding the activity of the wind speed when the techniques are used to forecast [15]. Therefore, several researchers have attempted to improve the forecasting precision via the addition of a preprocessing stage for time series decomposition and feature selection [30]. Time series decomposition approaches have been widely used to decompose the original complex-structured series into several simple series, such as the wavelet transform (WT) [31] and variational mode decomposition (VMD) [32]. Feature selection approaches facilitate the selection of suitable inputs for the forecasting models, which enhances the performances of the models substantially. Various feature selection approaches have yielded satisfactory results, such as Gram–Schmidt orthogonalization (GSO) [33], the least absolute shrinkage and selection operator (LASSO) [19], and forward regression [34].

The main contribution of this paper is the developed hybrid forecasting model, which uses random forest, a Gram–Schmidt-based feature selection method, and the whale optimization algorithm. The proposed method has inherited all the advantages of the above approaches, which are embedded together. Instead of combining them in parallel, random forest and the Gram–Schmidt-based feature selection method are embedded in the whale optimization algorithm; this embedding is a challenging task in terms of computation. The objective function value of the whale optimization algorithm is determined by these two methods. The details will be presented in Section 2, which describes the use of the complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN), random forest (RF), the Gram–Schmidt-based feature selection method (GSFS), and the whale optimization algorithm (WOA) to solve several chained problems.(i)The first methodology (CEEMDAN in Section 2.1) addresses the problem that the signals that are detected from the measured objects are often mixed with various levels of noise. Wavelet analysis can filter out weak or useless signals and extract the information that users need. This paper introduces a data preprocessing strategy that is designed based on the theory of decomposition and reconstruction; it filters the noise from the original wind speed efficiently and searches for the intrinsic values of the wind speed data. Finally, this preprocessing procedure utilizes wavelet packet decomposition for denoising and multiscale wavelet decomposition to complete data preprocessing, and it guarantees the accurate forecasting of the wind speed.(ii)The second methodology (RF in Section 2.2) is a machine learning model that is utilized as a forecasting model after selecting all the important variables. This forecasting model is stable since it weakens the correlations between variables to address the multicollinearity issue. Furthermore, it is designed based on bootstrapping, which theoretically guarantees its stability and accurate forecasting results. In summary, compared with traditional programming methods, the utilized machine learning algorithm simplifies the code and improves the execution performance of the code. Second, machine learning technology can be used to solve problems that cannot be solved via traditional methods. Furthermore, machine learning can adapt to new data; namely, the system can be personalized according to the environment.(iii)The third methodology (GSFS in Section 2.3) is an effective and efficient variable selection approach that is used to select the important variable after the preprocessing procedure. This approach is designed based on an orthogonalization method, which makes the selected variables orthogonal with each other. Furthermore, in contrast to other penalized variable selection methods, this novel approach does not introduce any bias into the model; hence, all nuisance variables can be eliminated effectively.(iv)The fourth methodology (WOA in Section 2.4) is a whale optimization algorithm, which is a new swarm intelligence optimization algorithm that simulates the predation behavior of humpback whales in the ocean. The algorithm has the advantages of simple structure, few parameters, strong search performance, and easy implementation. It is applied in the proposed model in an embedded way to determine the optimal weights for assignment to machine learning models. The fitness function in WOA is determined by the machine learning model and the variable selection method.

The hybrid forecasting model, namely, RF-GSFS-WOA, is proposed in Section 3. The experimental results and the corresponding analysis are presented in Section 4. Finally, Section 5 presents the conclusions of this study.

2. Materials and Methods

In this section, four techniques, namely, CEEMDAN, random forest, GSFS, and WOA, will be introduced. The proposed methodology is developed based on these important techniques.

2.1. Complementary Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN)

The complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN) originates from the ensemble empirical mode decomposition (EEMD) algorithm. To solve the mode mixing problem, adaptive white noise that has a specified standard deviation can be added at every stage of the decomposition, and a unique residue can be calculated to generate each mode. Moreover, the decomposition is completed with a negligible error; hence, the CEEMDAN provides superior spectral separation of the modes and an accurate reconstruction of the original signal at a lower computational cost [35, 36].

The process of the CEEMEDAN is as follows.Step 1: add realizations of a white Gaussian noise series and generate the corresponding signal:where is the original signal, is the number of the trials, is the noise standard deviation, and denotes the realizations of the white Gaussian noise series.Step 2: decompose realizations via EMD to obtain their first intrinsic model function (IMF):Step 3: calculate the first residue (at the first stage ):Step 4: decompose the noise-added residues until their first EMD mode and define the (k+1)-th IMF of the CEEMDAN:where the operator produces the k-th mode of the EMD.Step 5: calculate the -th residue:where , is the total number of modes.Step 6: repeat step 4 and step 5 for the next k until the obtained residue is not suitable for decomposition. The final residue is expressed asHence, the original signal can be expressed as

2.2. Random Forest

Random forest is an ensemble learning method that uses decision trees as base learners and measures the importance of each variable. Random forest is a variant of bagging (bootstrap aggregating), which considers all attributes or variables when establishing a base learner. Decision trees tend to overfit the data, thereby leading to low bias but high variance. To overcome this drawback, random forest constructs decision trees and integrates the results of all individual trees to generate the final outcome [37]. In classification, the mode of the classes that are predicted by the trees is adopted as the output, while, in regression, the mean of the predictions of all individual trees is used.

Since bootstrapping is used to generate the sample from which an individual tree is grown, only approximately 63.2% of the observations are used to produce each base learner, and the remainder of the training set can be used as a validation set: suppose that there are samples. The probability that not all the samples are chosen in the training set is which is if . For data , with representing the observations and representing the response variable, the out-of-bag (OOB) estimate, which is denoted by , is produced by base learners [32, 33]. Given an ensemble of classifiers , it is defined as follows:where is the set that consists of all possible values of the response and is the subsample that is applied to produce base learner . The OOB estimate of the generalization error is expressed aswhere is the training set.

Both bagging and random forest measure the importance of each variable via the following procedure: (1) calculate the OOB error of each observation and average them to obtain the OOB error of the forest. (2) Permute the values of the j-th feature among the training set. (3) Compute the OOB error again on this perturbed data set, and the difference in the OOB error before and after the permutation can be used to measure the importance of the j-th variable, where a larger difference corresponds to higher importance of variable j.

2.3. Gram–Schmidt-Based Feature Selection Method (GSFS)

Feature selection substantially facilitates the reduction of the model complexity and the enhancement of the forecasting accuracy when establishing a forecasting model. Many feature selection methods are designed based on penalized regression, which shrinks the estimate; these methods yield inaccurate results. In this study, a subset selection method, namely, the Gram–Schmidt-based feature selection method (GSFS) [38], is used for feature selection. The objective of GSFS is to identify all the relevant features that contribute to the model forecasting. The details of GSFS are presented as follows.

GSFS iterations: let be the empty set and initialize and to 0. Here, denotes the initial forecasting values. Calculate the pseudoresponse and select the optimal variable by regressing on :where . Then, and let . The objective in building is to add the optimal variables into the empty set.

In the next steps, we will conduct orthogonalization before selecting each new feature. If , orthogonalize to via the Gram–Schmidt method. Similar to the first step, compute the pseudoresponse and considerwhere . Update and . The objective of this procedure is to select variables that are orthogonal with each other sequentially and add them into an empty set.

The main advantage of GSFS is that it does not involve any penalized regression. It detects the relevant features by examining each feature sequentially. Although it may take more time, it will not miss any important data information. In the second step, GSFS computes the pseudoresponse as the target vector, which can be used to find the most strongly correlated features with the pseudoresponse. This pseudoresponse differs from the ordinary response that is applied in regression or classification problems. It is updated in every iteration of GSFS to eliminate the effects of selected features. An orthogonalization procedure that is based on the Gram–Schmidt method is applied to make all the features orthogonal with each other. This procedure removes the dependencies among features and shortens the computation time for inverting the projection matrix.

The determination of plays an important role in guaranteeing the accuracy and the selection consistency. Let ; namely, let the number of features increase with the sample size. We must impose the following regularity condition before proving the theorem.(C1). It follows that .Assume that the noise term is independent of each feature and let and be constants; namely,(C2) for Let , which represents the noise level and let and . Assume that there exists such that(C3).(C4).(C5)There exists such that and .

Theorem 1. Suppose that regularity conditions (C1)–(C5) are satisfied. Let and Then, , where denotes the set of important variables.

Theorem 1 establishes the selection consistency of the GSFS method and provides a theoretical guarantee on the choice of . Due to the selection consistency, the feature selection method can detect all the important features as the sample size approaches infinity. Here, contains the features that are selected in step . To select the optimal model with all the important features, high-dimensional information criteria (HDIC) will be applied after GSFS. The proof of Theorem 1 is similar to [34] and is omitted here.

2.4. Whale Optimization Algorithm (WOA)

In 2016, Mirjalili and Lewis [39] proposed the whale optimization algorithm (WOA), which is a novel metaheuristic that mimics the bubble-net foraging behavior of humpback whales in searching for their prey. It is illustrated in Figure 2(a). The humpback whales attack small fishes or schools of krill that are close to the surface by swimming in a shrinking circle and creating distinctive bubbles along a spiral-shaped path. The algorithm includes two phases: the first phase is encircling a prey and applying the spiral bubble-net attacking method, and the second phase is searching randomly for a prey. The details of each phase are presented as follows.

2.4.1. Encircling Prey

The humpback whales dive downwards and generate bubbles in a spiral shape, which makes the prey (typically krill) feel threatened and, hence, gather together. It is assumed that there are N humpback whales in a d-dimensional search space. The position of each whale is denoted by . The position of the prey represents the globally optimal solution to the problem. The humpback whales can identify the prey’s position and encircle it. However, there is no priori information about the optimal solution. Therefore, in the WOA, the current best position is adopted as the position that corresponds to the optimal solution. Once this position has been identified, other humpback whales will gather around it and update their position as follows [36]:where is the current iteration, is the current position, is the current best position, and and are the coefficient vectors:where and are 2 random numbers that are in the interval (0, 1) and is a convergence factor, which decreases linearly as the iteration proceeds. Define as the iteration number and as the maximum allowed number of iterations. Factor is defined as

Two mechanisms are designed to describe the hunting behavior of the humpback whales, namely, the shrinking encircling mechanism and the spiral position updating mechanism, as illustrated in Figures 2(b) and 2(c). Shrinking encircling is realized as the convergence factor decreases, while, in spiral position updating, the spiral movement of the humpback whales is simulated by the following:where is the distance between the humpback and the prey, is the shape parameter of the spiral, and is a random number that is in the interval (−1, 1). Since the humpback whales perform these two behaviors simultaneously, in each iteration, each of these mechanisms has a probability of 0.5 to be utilized. The corresponding model is as follows:where is a random number in (0, 1).

2.4.2. Search for Prey

To enhance the exploration in WOA, instead of updating the positions of the humpback whales based on the best position so far, a random position is chosen to guide the search accordingly. with random values that are greater than 1 or less than −1 is applied to force the position to move far away from the best position.

The model of searching for prey is defined as [40]where is a random position (or a random whale) that is chosen from the current population.

3. Proposed Hybrid Forecasting Model: RF-GSFS-WOA

In this paper, the proposed RF-GSFS-WOA is defined by the following procedure.Step 1: the original wind speed time series is decomposed into I IMFs (let IMFI = R (n)) by following Step 1–Step 6 in Section 2.1.Step 2: a quadratic model (QM) is established for each :where are variables (first order) in the original dataset; are quadratic variables (second order) that are constructed by calculating the component-wise product of and ; are the corresponding coefficients of the variables and quadratic variables; and are independent Gaussian noise errors that follow . quadratic models will be obtained, which are denoted by . As variables are included in the QM, the computational cost is high even if the value of is moderate. Therefore, a variable selection method is urgently needed to reduce the computational cost.Step 3: to reduce the computational cost of QM in Step 2, GSFS is used to complete the variable selection task. With slight abuse of notation, is denoted by . Let , where represents the initial forecasting value, , and the optimal variables are selected by regressing on , which is expressed aswhere . Then, and let . In the next steps, the data will be orthogonalized. If , orthogonalize to via the Gram–Schmidt method. Similar to the first step, compute the pseudoresponse and considerwhere . Update and .Step 4: random forest model is built over the variables that were selected in Step 3.Step 5: assign a weight to each , and the final forecast model is provided by the ensemble model (EM):with , where is obtained via the WOA algorithm.

The RF-GSFS-WOA algorithm is presented in detail as Algorithm 1. The proposed combination forecasting method, namely, RF-GSFS-WOA, has the following advantages. Instead of applying a single decision tree, RF grows many decision trees and, hence, can handle the correlation among the features. It bootstraps samples, which promotes the diversity of the base learners and, consequently, reduces the number of variables of the model. Furthermore, RF outperforms bagging as the number of base learners increases. It trains fast since it applies a fraction of the features instead of considering all possible features. The time that is required for training RF is often shorter than that for bagging. The overfitting problem of RF can be controlled well via the selection of a suitable number of features in each decision tree. GSFS is a greedy feature selection method, which is a type of forward selection method. In contrast to penalized feature selection methods such as LASSO and SCAD, it does not introduce any bias into the model. WOA is a metaheuristic method that can obtain the globally optimal solutions of nonlinear optimization problems. A flowchart of the proposed RF-GSFS-WOA is presented in Figure 3. The RF-GSFS-WOA algorithm is presented in the Appendix.

(i)INPUTS: , the wind speed dataset.
(ii)PARAMETERS:
 Nstd: noise standard deviation in CEEMDAN
 NR: number of realizations in CEEMDAN
 MaxIter: maximum number of sifting iterations that are allowed in CEEMDAN
  q: variable that is selected in GSFS
  MaxIterRF: maximum number of iterations in random forest
  NTress: number of trees in random forest
  N: number of whales in WOA
  MaxIterWOA: maximum number of iterations in WOA
(iii)OUTPUTS : the forecasting error
(1)The original dataset is divided into training dataset and test dataset .
(2)Decomposition using CEEMDAN, which results in I IMFs: .
(3)Establish a quadratic model of and , which results in and
(4)INITIALIZATION: set , .
(5) IF DO
(6)  Calculate the pseudoresponse
(7)  Perform regression on the data with a pseudoresponse and select the most strongly correlated variable
(8)  Update the response and .
(9) END IF
(10) FOR to DO
(11)  Orthogonalize via the Gram–Schmidt method.
(12)  Perform regression on the data with a pseudoresponse and select the most strongly correlated variable
(13)  Update the response and .
(14) END FOR
(15)Build the solution path and use HDIC to select the optimal solution, including q important variables.
(16)Grow a random forest with NTress = 3000 trees using the selected variables with MaxIterRF iterations.
(17)Initialize the whale population for weight matrix in which the columns are weights for established RF models.
(18)Calculate the fitness function using and RF models
(19)Let be the current best weights
(20)WHILE DO
(21) FOR each weight DO
(22)  Update and
(23)  IF THEN
(24)  IF THEN
(25)  Update the positions of the current weight via (12) and (13)
(26)  ELSE IF THEN
(27)  Randomly select a search agent
(28)  Update the position of the current search agent via (17)
(29)  END IF
(30) ELSE IF THEN
(31)Update the position of the current weight via (16)
(32) END FOR
(33)Check if any search agent goes beyond the search space
(34)Calculate the fitness function of each weight
(35)Update if there is a better solution
(36)
(37)END WHILE
(38)Obtain the optimal weights
(39)Calculate the forecasting error using the optimal weights and the RF model that is generated by and .

4. Case Studies

This section describes the data collection process, establishes the forecasting assessment criteria, and presents, discusses, and analyzes the experiment results. The main objective of the experiments is to evaluate the performance of the proposed hybrid forecasting model, namely, RF-GSFS-WOA.

4.1. Data Collection

Qinghai province is located in the northeast of the Qinghai-Tibetan Plateau, and most of the area is rich in wind energy resources; hence, it is suitable for the installation of large-scale wind farm. Thus, it is meaningful to investigate sites in this region. The hourly data were collected between 0 : 00 and 24 : 00 from 1/1/2014 to 12/31/2014 at six sites in the Qinghai region. The dataset that is used in this study was collected by National Renewable Energy Laboratory (NREL) and is available at https://www.nrel.gov/gis/tools.html. To simplify the description, the six sites are denoted by Site 1, Site 2, Site 3, Site 4, Site 5, and Site 6. The longitudes and latitudes of these locations are specified in Figure 4. Five meteorological factors, namely, the temperature (T), pressure (P), relative humidity (RH), wind precipitable water (PW), and wind direction (WD), are considered in the forecasting of the wind speed for the Qinghai region, which facilitates the design of wind power plants to efficiently generate electricity for the local population [41]. There are four seasons in each year: spring (Mar., Apr., and May), summer (Jun., Jul., and Aug.), autumn (Sep., Oct., and Nov.), and winter (Dec., Jan., and Feb.). The data are divided into training data and forecasting data as follows: 20 days in each season are randomly chosen as the training data (20 days × 24 hours/days = 480) for building the model, and the forecasting data include 5 days (5 days × 24 hours/day = 120) that are randomly chosen from the remaining days in each season. The performances of various forecasting models are evaluated in the four seasons. The training sets and forecasting sets for the six sites are described in Table 1. The forecasting sets account for approximately 20% of the total data, which is reasonable.

4.2. Forecasting Assessment Criteria

To evaluate the results of the forecasting models quantitatively, four assessment measures, namely, the mean absolute error (MAE), root mean square error (RMSE), mean absolute percent error (MAPE), and Theil inequality coefficient (TIC), are applied. All are calculated between the actual data and the forecasted values, and the smaller their values, the better the forecasting results. These criteria are expressed as follows:where is the size of the forecasting sample and and are the actual value and the forecasted value, respectively, for time period .

4.3. Experiment Results and Corresponding Analysis

The experimental settings are described in detail in Table 2. The parameters in the complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN) are set as follows: the noise standard deviation (Nstd) is set as 0.2, the number of realizations (Nr) is set as 500, and the maximum allowed number of sifting iterations (MaxIter) is set as 500. The regularization parameters in LASSO and SCAD are chosen via 10-fold cross-validation. In the experiments on the neural networks, the maximum number of iterations (MaxIterNN) is set as 2000 for both BP and Elman. Each activation function from the hidden layer to the output layer is set as the Tansig function (Func). The number of hidden neurons is crucial since it mainly determines the network complexity. To determine the optimal number of hidden neurons, values are selected from the following set: . Furthermore, Elman is trained using backpropagation through time, and the corresponding weights are selected based on a uniform distribution. Based on the experimental results, the gradient descent with momentum is set as 0.82, and the adaptive learning rate is set as 0.02. In GSFS, we set q as 30; hence, 30 variables will be chosen after the GSFS procedure, and, thus, there are 1200 variables in the quadratic model. The number of trees in RF is set as 3000, and the maximum number of iterations (MaxIterRF) is 500. RF is implemented using MATLAB function TreeBagger. The fitness function of WOA is selected as “F1,” which is the square error loss function, and the range is (0, 1). The weights are determined after normalization such that the sum of all the weights is 1.

To evaluate the forecasting performance of the proposed hybrid model for forecasting hourly wind speed, experiments for six sites and four seasons are designed in this paper. Four types of assessment criteria are used in all experiments to evaluate the performance. The proposed hybrid forecasting model is RF-GSFS-WOA, and six forecasting models, namely, an Elman neural network, Elman-LASSO, Elman-SCAD, a BP neural network (BP), BP-LASSO, and BP-SCAD, are applied as comparison models to evaluate the performance of the developed hybrid model. Wind speed data exhibit volatility or instability due to changes in the weather; hence, this study conducts a preprocessing step before building all the models; namely, the complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is used to eliminate the noise and smooth the wind speed time series data before the developed hybrid model and six comparison models are established for hourly wind speed forecasting in four seasons at six sites.

In each season, 20 days of hourly wind speed time series are selected randomly as the training phase for the construction of the models, and 5 days of wind speed time series are selected randomly as the forecasting phase. Thus, 80% of the data will be used as training data; the remaining data (20%) will be used as test data. The reason for this splitting scheme is that the model can be trained sufficiently with more observations (80% of the data) and overfitting can be avoided. However, the number of observations in the test data is also important: the test accuracy cannot be evaluated accurately if the number of observations is too small (less than 20%). The multiple output strategy (5 days are used as outputs), which is one of the multistep forecasting techniques, is applied in the experiments.

The wind speed is influenced by several meteorological variables. Consequently, seven forecasting models consider multidimensional meteorological factors, such as the temperature, pressure, relative humidity, precipitable water, and wind direction, as dependent variables in multivariate models. The use of multivariable inputs increases the difficulty of determining the model parameters.

Four performance criteria, namely, MAPE, RMSE, MSE, and TIC, for each forecasting model in the four seasons at the six sites, are listed in Tables 35. Figures 57 present the forecasting results of the seven models for observation sites 1–6, which successfully illustrate that the proposed hybrid model RF-GSFS-WOA yields closer results to the real wind speed data than any other forecasting model in all instances. Furthermore, the developing hybrid framework realizes the best MAPE, RMSE, MSE, and TIC values. For hourly wind speed forecasting with multidimensional meteorological variables, the forecasting performance is described in detail in Tables 35 and Figures 58, from which the following findings are obtained.(I)From Table 3 and Figure 5, it is found that Elman-based models, namely, Elman, Elman-LASSO, and Elman-SCAD, outperform the BP-based models, namely, BP, BP-LASSO, and BP-SCAD, and that Elman and BP yield the worst forecasting results. For instance, at Site 1, the developed hybrid model, namely, RF-GSFS-WOA, outperforms the other six forecasting models. The RF-GSFS-WOA model realizes the smallest MAPE of 2.44% in summer, and similar results are obtained in terms of the MAE, RMSE, and TIC criteria.(II)According to Table 4 and Figure 6, the proposed RF-GSFS-WOA still outperforms other forecasting models by realizing lower values of MAE, RMSE, MAPE, and TIC. The forecasting models that were established based on Elman substantially outperform the models that were constructed based on BP. Elman-SCAD outperforms BP-SCAD. For example, the MAPE of Elman-SCAD is 7.61, which is critically lower than that of BP-SCAD, which is 10.37 in spring at Site 3. In autumn at Site 4, the RMSE of Elman-SCAD is 0.27, which is approximately 34.15% lower than that of BP-SCAD. Furthermore, similar to the results from Table 3 and Figure 5, the SCAD-related forecasting models outperform the LASSO-related forecasting models. This is mainly because SCAD can better handle the collinearity of variables than LASSO. Similar phenomena are observed in Table 5 and Figure 7. RF-GSFS-WOA realizes remarkable forecasting performances.(III)According to Figure 8, which presents the average MAPE and RMSE values of the compared forecasting methods in the four seasons, RF-GSFS-WOA realizes the lowest MAPE values over all seasons, which are 4.04%, 3.73%, 3.02%, 3.08%, 2.81%, and 4.13% at the six sites. Elman-SCAD still outperforms Elman-LASSO and BP-SCAD. The worst value is obtained by BP, which is approximately 20%. BP-LASSO and BP-SCAD slightly outperform BP, which does not perform as well as Elman.

Therefore, from the discussions above, the proposed hybrid forecasting model, namely, RF-GSFS-WOA, performs well in the wind speed forecasting task by combining the advantages of machine learning module RF, feature selection approach GSFS, and metaheuristic method WOA.

4.4. Statistical Test

To further evaluate the performance of the proposed forecasting model, a nonparametric statistical test, namely, the Friedman test, is conducted. The forecasting performances of the compared methods differ among the six sites [42]. The following hypothesis, namely, the null hypothesis , along with the alternative hypothesis , is considered in the Friedman test:where , with representing the j-th rank of algorithms on a dataset of size B. The Friedman statistic is defined as follows [43]:

This statistic follows chi-square distribution . An F-distribution statistic, namely, , is calculated as follows:

Suppose that there are significant differences among the comparing forecasting methods. Then, the null hypothesis will be rejected. If this occurs, critical differences (CDs) will be applied for further comparisons, which is known as a post hoc test. A Bonferroni-Dunn test is used with CD values , where is determined as 2.291 in [44]. The Friedman test is described in detail in Table 6. All the values are much larger than the critical value  = 2.42. This suggests that the null hypothesis should be rejected and a post hoc test is needed. Using the above-described method, the CD value is calculated as , which is used as a benchmark for comparison. The proposed RF-GSFS-WOA significantly outperforms BP-LASSO, BP, and Elman since the differences between the average ranks of these methods exceed 2.85. For instance, the difference between RF-GSFS-WOA and BP-LASSO is 4.67 for MAE, which is larger than 2.85. Compared to other methods, the proposed method does not show strong advantages in forecasting. However, based on the above analysis, the proposed method delivers remarkable forecasting results.

5. Conclusions

Accurate wind speed forecasting has attracted the attention of researchers of wind energy for decades. This paper proposed and investigated a novel hybrid wind speed forecasting model that combines the advantages of CEEMDAN, GSFS, RF, and WOA. The proposed model has been applied to data from six sites in Qinghai province to forecast the wind speed. The forecasting performances are evaluated using four statistical measures: MAE, RMSE, MAPE, and TIC. The forecasting results demonstrate that the proposed hybrid model outperforms other compared forecasting models, namely, Elman, Elman-LASSO, Elman-SCAD, BP, BP-LASSO, and BP-SCAD, which are famous wind speed forecasting models and variable selection methods. Moreover, the Elman-based forecasting models deliver better results than the BP-based models, and the SCAD-based models outperform the LASSO-based models. Furthermore, by using variable selection methods such as GSFS, LASSO, and SCAD, the model performances are boosted substantially. Thus, variable selection is necessary since it can be used to select the meteorological factors that make the most important contributions to the forecasting model while discarding useless factors. To further evaluate the performance of the proposed model, a nonparametric test, namely, the Friedman test, is conducted to evaluate the forecasting results. The test results are consistent with the findings of the analysis of the results. Consequently, according to the experimental results, the proposed hybrid model forecasts the wind speed accurately and efficiently [38, 45].

Data Availability

The data that are used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This research was supported by the National Natural Science Foundation of China (Grant nos. 71901109, 71761016, 61973145, 71861012, and 18ATJ001), Natural Science Foundation of Jiangxi, China (no. 20181BAB211020), Jiangxi Double Thousand Plan, Postdoctoral Foundation of Jiangxi Province (no. 2018KY08), Scientific Research Fund of Jiangxi Provincial Education Department (Grant nos. GJJ180287, GJJ180247, and GJJ190264), and Human and Social Science Foundation of Jiangxi Province (no. TJ19202).