Abstract

The need for an efficient power source for operating the modern industry has been rapidly increasing in the past years. Therefore, the latest renewable power sources are difficult to be predicted. The generated power is highly dependent on fluctuated factors (such as wind bearing, pressure, wind speed, and humidity of surrounding atmosphere). Thus, accurate forecasting methods are of paramount importance to be developed and employed in practice. In this paper, a case study of a wind harvesting farm is investigated in terms of wind speed collected data. For data like the wind speed that are hard to be predicted, a well built and tested forecasting algorithm must be provided. To accomplish this goal, four neural network-based algorithms: artificial neural network (ANN), convolutional neural network (CNN), long short-term memory (LSTM), and a hybrid model convolutional LSTM (ConvLSTM) that combines LSTM with CNN, and one support vector machine (SVM) model are investigated, evaluated, and compared using different statistical and time indicators to assure that the final model meets the goal that is built for. Results show that even though SVM delivered the most accurate predictions, ConvLSTM was chosen due to its less computational efforts as well as high prediction accuracy.

1. Introduction

The need to move towards renewable and clean energy sources has increased considerably over the previous years. Fossil fuels are being misused excessively and eventually will waste away. However, renewable energy (RE) sources such as wind, solar, and hydraulic or hydroelectric are regularly replenished and will sustain forever. Grid operators who use RE face many challenges which lead to variability and uncertainty in power generation. For instance, in the case of solar power where the existence of clouds that move above solar power plants can narrow power generation for brief intervals of time. Cloud cover may introduce a very quick shift in the outcome of solar structures, but solar energy is still considered to be highly predictable as the sun motion is understood clearly [1]. However, wind power generation is less predictable due to the fact that fluctuations in wind speed are stochastic in nature. This issue will cause a break between supply and demand. So, in order to enhance and optimize renewable wind power generation, wind speed or power production forecasting models are recently being used to resolve this problem. This has led to huge increase in installing wind power plants [2].

As the demand for wind power has increased over the last decades, there is a serious need to set up wind farms and construct facilities depending on accurate wind forecasted data. Collected short-term wind forecasting has a significant effect on the electricity [3], which is also necessary to identify the size of wind farms.

It is obvious that there is a need for an accurate wind forecasting technique to substantially reduce the cost by wind power scheduling [4]. There are several methods which are aimed at short-time wind forecasting (e.g., statistical time series and neural networks). For an advanced and more accurate forecasting, the hybrid models are used. These models combine physical and statistical approaches, short and medium-term models, and combinations of alternative statistical models.

The concept of artificial neural networks (ANNs) was first introduced by McCulloch and Pitts [5] in 1943 as a computational model for biological neural networks. Convolutional neural network (CNN) was influenced by “Neocognitron” networks which were first introduced by Fukushima in 1980 [6]. CNN was based on biological processes which were hierarchical multilayered neural networks used for image processing. These networks are capable of “learning without a teacher” for recognition of various catalyst shapes depending on their geometrical designs [7].

Long short-term memory (LSTM) [8] is built upon recurrent neural network (RNN) structure. It was designed by Hochreiter and Schmidhuber in 1997. LSTM uses the concept proposed in [9] which depends on feedback connections between its layers. Unlike standard feedforward neural networks, LSTM can process entire sequences of data (such as voice or video) and not just single data points (such as images).

Support vector machine (SVM) [10] is a popular machine learning technique, which is advanced enough to deal with complex data. It is aimed to deal with challenges in classification problems.

In 2016, Convolutional LSTM (ConvLSTM) was used to build a video prediction model by Shi et al. [11]. A tool is developed to prognose action-conditioned video that modeled pixel movement, by predicting a distribution over pixel movement from earlier frames. Stacked convolutional LSTM was employed to generate motion predictions. This approach has gained the finest outcomes in predicting future object motion.

An end-to-end learning of driving models was developed in [12] using a LSTM-based algorithm. A trainable structure for learning how to accurately predict a distribution among upcoming vehicle movement is developed through learning a generic vehicle movement from large-scale crowd-sourced video. The data source used a rapid monocular camera, observations, and past vehicle state. The images were encoded through long short-term memory fully convolutional network (FCN-LSTM) to determine the related graphical illustration in every input frame, side by side with a temporal network to use the movement history information. The authors were able to compose an innovative hybrid structure for time-series prediction (TSP) that combined an LSTM temporal encoder utilizing a fully convolutional visual encoder.

Various papers have been explored in the literature on wind speed forecasting. For instance, a model was introduced by Xu et al. [13] to predict short-term wind speed using LSTM, empirical wavelet transformation (EWT), and Elman neural network approaches. The EWT is implemented to break down the raw wind speed data into multiple sublayers and employ them in Elman neural network (ENN) and LSTM network to predict the low and high frequency sublayers. Unscented Kalman filter (UKF) along with support vector regression (SVR) based state-space model was applied by Chen and Yu [14] to efficiently correct the short-term estimation of wind speed chain.

A nonlinear-learning scheme of deep learning time series prediction, EnsemLSTM, was developed by Chen et al. [15]. This scheme relied on LSTMs, support vector regression machine (SVRM), and extremal optimization algorithm (EO). Wind speed data are forecasted separately by an array of LSTMs that contained covered layers. Neurons are built in every hidden layer. The authors proved that the introduced EnsemLSTM is capable of achieving an improved forecasting execution along with the least mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and the highest R-squared (R2).

A hybrid model constructed of wavelet transform (WT) and SVM was proposed by Liu et al. [16] to predict wind speed in the short term. The model is improved by genetic algorithm (GA), which is implemented to vary essential specifications of SVM through reducing the produced errors and searching the optimum specifications to bypass the danger of instability. The presented model is proved to be more efficient than SVM-GA model. Wang [17] developed a genetic algorithm of wavelet neural network (GAWNN) model. The developed model showed an enhanced operation as compared to the normal wavelet neural network (WNN) model in predicting short-term wind power. The model can be located at the beginning of network training as well as in convergent precision.

A prediction model was proposed by Sheikh et al. [18] based on support vector regression (SVR) and neural network (NN) with backpropagation technique. A windowing data preprocessing was combined with cross and sliding window validations in order to predict wind speed with high accuracy. A hybrid method was presented by Nantian et al. [19], which included variational mode decomposition (VMD), partial autocorrelation function (PACF) feature selection, and modular weighted regularized extreme learning machine (WRELM) prediction. The optimal number of decomposition layers was analyzed by the prediction error of one-step forecasting with different decomposition layers.

A robust forecasting model was proposed by Haijian and Deng [20] by evaluating seasonal features and lag space in wind resource. The proposed model was based on the multilayered perceptron with one hidden layer neural network using the Levenberg–Marquardt optimization method. Least squares support vector machine (LSSVM) was used by Xiaodan [21] for the wind speed forecasting. The accuracy of the prediction model parameters was optimized utilizing the particle swarm optimization (PSO) to minimize the fitness function in the training process. Ningsih et al. [22] predicted wind speed using recurrent neural networks (RNNs) with long short-term memory (LSTM). Two optimization models of stochastic gradient descent (SGD) and adaptive moment estimation (Adam) were evaluated. The Adam method was shown to be better and quicker than SGD with a higher level of accuracy and less deviation from the target.

A nonlinear autoregressive neural network (NARNET) model was developed by Datta [23]. The model employed univariate time series data to generate hourly wind speed forecast. The closed loop structure provided error feedback to the hidden layer to generate forecast of the next point. A short-term wind speed forecasting method was proposed by Guanlong et al. [24] using a backpropagation (BP) neural network. The weight and threshold values of BP network are trained and optimized by the improved artificial bee colony algorithm. Then, the gathered samples of wind speed are trained and optimized. When training is finished, test samples are used to forecast and validate.

Fuzzy C-means (FCM) clustering was used by Gonggui et al. [25] to forecast wind speed. The input data of BP neural network with similar characteristics are divided into corresponding classes. Different BP neural networks are established for each class. The coefficient of variation is used to illustrate the dispersion of data, and statistical knowledge is used to illuminate the input data with large dispersion from the original dataset. Artificial neural networks (ANNs) and decision trees (DTs) were used by ZhanJie and Mazharul Mujib [26] to analyze meteorological data for the application of data mining techniques through cloud computing in wind speed prediction. The neurons in the hidden layer are enhanced gradually, and the network performance in the form an error is examined. Table 1 highlights the main characteristics of the existing schemes developed for wind speed forecasting.

The novelty of this work lies in enhancing the accuracy of wind speed forecasting by using a hybrid model called ConvLSTM and comparing it with other four commonly used models with optimized lags, hidden neurons, and parameters. This includes testing and comparing the performance of these five different models based on historical data as well employing multi-lags-one-step (MLOS) ahead forecasting concept. MLOS provided an efficient generalization to new time series data. Thus, it increased the overall prediction accuracy. The remainder of this paper is organized as follows. Section 2 describes the four learning algorithms in addition to a hybrid algorithm investigated for an accurate wind speed forecasting. Section 3 illustrates the study methodology. Section 4 shows a real case study of a wind farm. Section 5 introduces the results and discussion. Finally, conclusions and future works are presented in Section 6.

1.1. Acronyms and Notations

Table 2 illustrates the acronyms and notations used through the paper.

2. Prediction Algorithms

In this section, the algorithms used for wind speed forecasting are summarized as follows.

2.1. LSTM Algorithms

LSTM is built in a unique architecture that empowers it to forget the unnecessary information, by turning multiplication into addition and using a function whose second derivative can preserve for a long range before going to zero in order to reduce the vanishing gradient problem (VGP). It is constructed of the sigmoid layer which takes the inputs and and then decides by generating the zeros which part from the old output should be removed. This process is done through forget gate . The gate output is given as . After that, a vector of all the possible values from the new input is created by tan h layer. These two results are multiplied to renew the old memory In other words, the sigmoid layer decides which portions of the cell state will be the outcome. Then, the outcome of the sigmoid gate is multiplied by all possible values that are set up through tan h. Thus, the output consists of only the parts that are decided to be generated.

LSTM networks [8] are part of recurrent neural networks (RNNs), which are capable of learning long-term dependencies and powerful for modeling long-range dependencies. The main criterion of the LSTM network is the memory cell which can memorize the temporal state. It is also shaped by the addition or removal of information through three controlling gates. These gates are the input gate, forget gate, and output gate. LSTMs are able to renew and control the information flow in the block using these gates in the following equations:where “·” presents matrix multiplication, “⊙” is an elementwise multiplication, and “θ” stands for the weights. is the input to the cell c which is gated by the input gate, while ot is the output. The nonlinear functions σ and tan h are applied elementwise, where. Equations (1) and (2) establish gate activations, equation (3) indicates cell inputs, equation (4) determines the new cell states, where the ‘memories’ are stored or deleted, and equation (5) results in the output gate activations which are shown in equation (6), the final output.

2.2. CNN Algorithms

CNN is a feed-forward neural network. To achieve network architecture optimization and solve the unknown parameters in the network, the attributes of a two-dimensional image are excerpted and the backpropagation algorithms are implemented. To achieve the final outcome, the sampled data are fed inside the network to extract the needed attributes within prerefining. Next, the classification or regression is applied [27].

The CNN is composed of basically two types of layers: the convolutional and the pooling layers. The neurons are locally connected within the convolution layer and the preceding layer. Meanwhile, the neurons’ local attributes are obtained. The local sensitivity is found through the pooling layer to obtain the attributes repeatedly. The existence of the convolution and the pooling layers minimizes the attribute resolution and the number of network specifications which require enhancement.

CNN typically describes data and constructs them as a two-dimensional array and is extensively utilized in the area of image processing. In this paper, CNN algorithm is configured to predict the wind speed and fit it to process a one-dimensional array of data. In the preprocessing phase, the one-dimensional data are reconstructed into a two-dimensional array. This enables CNN machine algorithm to smoothly deal with data. This creates two files: the property and the response files. These files are delivered as inputs to CNN. The response file also contains the data of the expected output value.

Each sample is represented by a line from the property and the response files. Weights and biases can be obtained as soon as an acceptable number of samples to train the CNN is delivered. The training continues by comparing the regression results with the response values in order to reach the minimum possible error. This delivers the final trained CNN model, which is utilized to achieve the needed predictions.

The fitting mechanism of CNN is pooling. Various computational approaches have proved that two approaches of pooling can be used: the average pooling and the maximum pooling. Images are stationary, and all parts of image share similar attributes. Therefore, the pooling approach applies similar average or maximum calculations for every part of the high-resolution images. The pooling process leads to reduction in the statistics dimensions and increase in the generalization strength of the model. The results are well optimized and can have a lower possibility of over fitting.

2.3. ANN Algorithms

ANN has three layers which build up the network. These are input, hidden, and output layers. These layers have the ability to correlate an input vector to an output scalar or vector using activation function in various neurons. The hidden neuron can be computed by the inputs and m hidden neurons using the following equation [14]:where is the connection weight from the ith input node to the jth hidden node, is i-step behind previous wind speed, and is the activation function in the hidden layer. Therefore, the future wind speed can be predicted throughwhere is the connection weight from the jth hidden node to the output node and is the predicted wind speed at the kth sampling moment while is the activation function for the output layer. By minimizing the error between the actual and the predicted wind speeds, and , respectively, using Levenberg–Marquardt (LM) algorithm, the nonlinear mapping efficiency of ANN can be obtained [28].

2.4. SVM Algorithms

Assuming a set of samples, where, with input vector and output vector. The regression obstacles aim to identify a function that describes the correlation between inputs and outputs. The interest of SVR is to obtain a linear regression in the high-dimensional feature space delivered by mapping the primary input set utilizing a preknown function and to minimize the structure risk. This mechanism can be written as follows [15]:where , , and respectively, are the regression coefficient vector, bias term, and punishment coefficient. is the e-insensitive loss function. The regression problem can be handled by the following constrained optimization problem:where and represent the slack variables that let constraints feasible. By using the Lagrange multipliers, the regression function can be written as follows:where and are the Lagrange multipliers that fulfil the conditions and . is a general kernel function. In this study, the well-known radial basis function (RBF) is chosen here as the kernel function:where σ defines the RBF kernel width [15].

2.5. ConvLSTM Algorithm

ConvLSTM is designed to be trained on spatial information in the dataset, and its aim is to deal with 3-dimentional data as an input. Furthermore, it exchanges matrix multiplication through convolution operation on every LSTM cell’s gate. By doing so, it has the ability to put the underlying spatial features in multidimensional data. The formulas that are used at each one of the gates (input, forget, and output) are as follows:where , , and are input, forget, and output gates and is the weight matrix, while is the current input data. is the previous hidden output, and is the cell state.

The difference between these equations in LSTM is that the matrix multiplication () is substituted by the convolution operation () between and each , at every gate. By doing so, the whole connected layer is replaced by a convolutional layer. Thus, the number of weight parameters in the model can be significantly reduced.

3. Methodology

Due to the nonlinear, nonstationary attributes and the stochastic variations in the wind speed time series, the accurate prediction of wind speed is known to be a challenging effort [29]. In this work, to improve the accuracy of the wind speed forecasting model, a comparison between five models is conducted to forecast wind speed considering available historical data. A new concept called multi-lags-one-step (MLOS) ahead forecasting is employed to illustrate the effect on the five models accuracies. Assume that we are at time index . To forecast one output element in the future , the input dataset can be splitted into many lags (past data) , where I ∈ {1–10}. By doing so, the model can be trained on more elements before predicting a single event in the future. In addition to that, the model accuracy showed an improvement until it reached the optimum lagging point, which had the best accuracy. Beyond this point, the model accuracy is degraded as it will be illustrated in the Results section.

Figure 1 illustrates the workflow of the forecasting model. Specifically, the proposed methodology entails four steps.

In Step 1, data have been collected and averaged from 5 minutes to 30 minutes and to 1 hour, respectively. The datasets are then standardized to generate a mean value of 0 and standard deviation of 1. The lagging stage is very important in Step 2, as the data are split into different lags to study the effect of training the models on more than one element (input) to predict a single event in the future. In Step 3, the models have been applied taking into consideration that some models such as CNN, LSTM, and ConvLSTM need to be adjusted from matrix shape perspective. These models normally work with 2D or more. In this stage, manipulation and reshaping of matrix are conducted. For the sake of checking and evaluating the proposed models, in Step 4, three main metrics are used to validate the case study (MAE, RMSE, and R2). In addition, the execution time and optimum lag are taken into account to select the best model.

Algorithm 1 illustrates the training procedure for ConvLSTM.

Input: the wind speed time series data
Output: forecasting performance indices
(1) The wind speed time series data are measured every 5 minutes, being averaged two times for 30 minutes and 1 hour, respectively.
(2) The wind datasets are split into Training, Validation, and Test sets.
(3) Initiate the multi-lags-one-step (MLOS) arrays for Training, Validation, and Test sets.
(4) Define MLOS range as {1 : 10} to optimize the number of needed lags.
(5) loop 1:
  Split the first set based on MLOS range
  Initiate and Extract set features with CNN layer
  Pass the output to a defined LSTM layer
  Select the first range of the number of hidden neurons
  Generate prediction results performance indices
  Count time to execute and produce prediction results
  Save and compare results with previous ones
  loop 2:
   Select next MLOS range
   If MLOS range = maximum range, then goto loop 3 and initialize MLOS range
   goto loop 1
   loop 3:
    Select new number of hidden neurons
    If number of hidden neurons range = maximum range, then goto loop 4 and initialize number of hidden neurons range
    goto loop 1
(6) loop 4:
  Select new dataset from the sets {5 min, 30 min, and 1 hr}
  goto loop 1
  If set range = maximum range, then:
  Generate the performance indices of all tested sets.
  Select the best results metrics

4. Collected Data

Table 3 illustrates the characteristics of the collected data in 5 minutes time span. The data are collected from a real wind speed dataset over a three-year period from the West Texas Mesonet, with 5-minute observation period from near Lake Alan, Garza [30]. The data are processed through averaging from 5 minutes to 30 minutes (whose statistical characteristics are given in Table 4) and one more time to 1 hour (whose statistical characteristics are also given in Table 5). The goal of averaging is to study the effect of reducing the data size in order to compare the five models and then select the one that can achieve the highest accuracy for the three dataset cases. As shown in the three tables, the data sets are almost identical and reserved with their seasonality. Also, they are not affected by the averaging process.

The data have been split into three sets (training, validation, and test) with fractions of 53 : 14 : 33.

5. Results and Discussion

To quantitatively evaluate the performance of the predictive models, four commonly used statistical measures are tested [20]. All of them measure the deflection between the actual and predicted wind speed values. Specifically, RMSE, MAE, and R2 are as follows:where and are the actual and the predicted wind speed, respectively, while is the mean value of actual wind speed sequence. Typically, the smaller amplitudes of these measures indicate an improved forecasting procedure, while R2 is the goodness-of-fit measure for the model. Therefore, the larger its value is, the fitter the model will be. The testbed environment configuration is as follows: (1)CPU : Intel (R) Core(TM) i7-8550U CPU @ 1.80 GHz, 2001 Mhz, 4 Core (s), 8 Logical Processor (s)(2)RAM : Installed Physical Memory 16.0 GB(3)GPU : AMD Radeon(TM) RX 550 10 GB(4)Framework: Anaconda 2019.07, Python 3.7

Table 6 illustrates the chosen optimized internal parameters (hyperparameters) for the forecasting methods used in this work. For each method, the optimal number of hidden neurons is chosen to achieve the maximum and the minimum and MAE values.

After the implementation of CNN, ANN, LSTM, ConvLSTM, and SVM, it was noticed that the most fitted model was chosen depending on its accuracy in predicting future wind speed values. Thus, the seasonality is considered for the forecast mechanism. The chosen model has to deliver the most fitted data with the least amount of error, taking into consideration the nature of the data and not applying naive forecasting on it.

To achieve this goal, the statistical error indicators are calculated for every model and time lapse and fully represented as Figure 2 illustrates. The provided results suggest that the ConvLSTM model has the best performance as compared to the other four models. The chosen model has to reach the minimum values of RMSE and MAE while maximum R2 value.

Different parameters are also tested to ensure the right decision of choosing the best fitted model. The optimum number of lags which is presented in Table 7 is one of the most important indicators in selecting the best fitted model. Since the less historical points are needed by the model, the computational effort will be less as well. For each method, the optimal number of lags is chosen to achieve the maximum and the minimum and MAE values. For instance, Figures 3 and 4 show the relation between the statistical measures and the number of lags and hidden neurons, respectively, for the proposed ConvLSTM method for the 5 minutes time span case. It can be seen that 4 lags and 15 hidden neurons achieved maximum and minimum and MAE values.

The execution time shown in Table 8 is calculated for each method and time lapse to assure that the final and chosen model is efficient and can effectively predict future wind speed. The shorter the time for execution is, the more efficient and helpful the model will be. This is also a sign that the model is efficient for further modifications. According to Table 8, the ConvLSTM model beats all other models in the time that it needed to process the historical data and deliver a final prediction; SVM needed 54 minutes to accomplish the training and produce testing results while ConvLSTM made it in just 1.7 minutes. This huge difference between them has made the choice of using ConvLSTM.

Figure 5 shows that the 5-minute lapse dataset is the most fitted dataset to our chosen model. It declares how accurate the prediction of future wind speed will be.

For completeness, to effectively evaluate the investigated forecasting techniques in terms of their prediction accuracies, 50 cross validation procedure is carried out in which the investigated techniques are built and then evaluated on 50 different training and test datasets, respectively, randomly sampled from the available overall dataset. The ultimate performance metrics are then reported as the average and the standard deviation values of the 50 metrics obtained in each cross validation trial. In this regard, Figure 6 shows the average performance metrics on the test dataset using the 50 cross validation procedure. It can be easily recognized that the forecasting models that employ the LSTM technique outperform the other investigated techniques in terms of the three performance metrics, R2, RMSE, and MAE.

From the experimental results of short-term wind speed forecasting shown in Figure 6, we can observe that ConvLSTM performs the best in terms of forecasting metrics (R2, RMSE, and MAE) as compared to the other models (i.e., CNN, ANN, SVR, and LSTM). The related statistical tests in Tables 6 and 7, respectively, have proved the effectiveness of ConvLSTM and its capability of handling noisy large data. ConvLSTM showed that it can produce high accurate wind speed prediction with less lags and hidden neurons. This was indeed reflected in the results shown in Table 8 with less computation time as compared to the other tested models. Furthermore, we introduced the multi-lags-one-step (MLOS) ahead forecasting combined with the hybrid ConvLSTM model to provide an efficient generalization for new time series data to predict wind speed accurately. Results showed that ConvLSTM proposed in this paper is an effective and promising model for wind speed forecasting.

Similar to our work, the proposed EnsemLSTM model by Chen et al. [15] contained different clusters of LSTM with different hidden layers and hidden neurons. They combined LSTM clusters with SVR and external optimizer in order to enhance the generalization capability and robustness of their model. However, their model showed a high computational complexity with mediocre performance indices. Our proposed ConvLSTM with MLOS assured boosting the generalization and robustness for the new time series data as well as producing high performance indices.

6. Conclusions

In this study, we proposed a hybrid deep learning-based framework ConvLSTM for short-term prediction of the wind speed time series measurements. The proposed dynamic prediction model was optimized for the number of input lags and the number of internal hidden neurons. Multi-lags-one-step (MLOS) ahead wind speed forecasting using the proposed approach showed superior results compared to four other different models built using standard ANN, CNN, LSTM, and SVM approaches. The proposed modeling framework combines the benefits of CNN and LSTM networks in a hybrid modeling scheme that shows highly accurate wind speed prediction results with less lags and hidden neurons, as well as less computational complexity. For future work, further investigation can be done to improve the accuracy of the ConvLSTM model, for instance, increasing and optimizing the number of hidden layers, applying a multi-lags-multi-steps (MLMS) ahead forecasting, and introducing reinforcement learning agent to optimize the parameters as compared to other optimization methods.

Data Availability

The wind speed data used in this study have been taken from the West Texas Mesonet of the US National Wind Institute (http://www.depts.ttu.edu/nwi/research/facilities/wtm/index.php). Data are provided freely for academic research purposes only and cannot be shared/distributed beyond academic research use without permission from the West Texas Mesonet.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors acknowledge the help of Prof. Brian Hirth in Texas Tech University for providing them with access to weather data.