Abstract

With the continuous development of Earth science, soil temperature has received more and more attention in Earth system research as an important parameter. The change of soil temperature (Ts) in different regions and related time series is affected by many factors, which bring certain difficulties to the accuracy of soil temperature prediction and the robustness of the algorithm. In this paper, an embedded network prediction model based on the gated recurrent unit (GRU) model is proposed to learn the local and global features of historical temperature for improving the prediction performance of soil temperature. We input different steps into the GRU model, and the output is weighted to obtain the final prediction result. In order to obtain the global characteristics of soil temperature, we connect the previous steps to the output layer directly, and the local characteristics of soil temperature are obtained through the following steps. This paper uses the soil temperature data from two meteorological stations (Laegern and Fluehli) in Switzerland as the input data to predict the soil temperature for different soil depths (5 cm, 10 cm, and 15 cm) at different time points (6 hrs, 12 hrs, and 24 hrs), using RMSE, MAE, MSE, and performance indicators as evaluation criteria to verify the accuracy of prediction. As the experimental results show, our method has the best performance compared to the others (artificial neural networks (ANN), extreme learning machine model (ELM), long short-term memory network (LSTM), gated recurrent unit network (GRU)). In particular, we estimated the soil temperature at the soil depth of 10 cm of the Fluehli station in the coming 6 hrs; our method achieved the best performance; and, meanwhile, our model achieved the maximum value of (0.9914) and the minimum values of RMSE (0.4668), MAE (0.2585), and MSE (0.2214) compared with the other four models. Therefore, our model can not only predict the soil temperature at different depths but also improve the accuracy.

1. Introduction

Geoscience has played an important role in social development and economic construction; soil temperature (Ts) and its daily fluctuations are among the most vital meteorological parameters in Earth sciences, such as agriculture, forestry, and geology [1, 2]; and it is an important variable of land-atmosphere interactions [3]. Meanwhile, there are many elements that affect the change of soil temperature; for example, the change of soil depth has a prominent effect on soil temperature. Research has shown that, in the processing of plant growth, shallow soil has a significant impact on seed germination, while deep soil affects root absorption activity [4]. Therefore, the accurate prediction of soil temperature at different depths can be used to guide practical applications in some fields, which is instead of using traditional sensors manually for on-site measurement [5].

Currently, most of the soil temperature prediction methods use environmental factors to estimate [6]. However, the data collected in some regions is unavailable that cannot be used to predict, which will reduce the accuracy of model predictions [7]. Therefore, this paper recommends using time series as the input data for the soil temperature prediction model.

In recent years, researchers usually use methods based on physical models to predict soil temperature through the heat transfer mechanism of the soil itself mainly [6, 8]. However, there are many limitations in practical applications due to the physical model parameterization and scale issues [9]. With the continuous development of the machine learning method, it has been widely used in Earth sciences [1012]. Ghorbani et al. proposed a method based on the support vector machine to estimate the soil field capacity and permanent wilting point [13]. And it also plays an important role in the soil temperature field [14]. The extreme learning machine is used to predict the soil temperature for improving the accuracy by Feng et al. [15]. Furthermore, LSTM has also received attention from researchers [16].

For the machine learning method, the artificial neural network (ANN) [2, 17, 18] and ELM [15, 19] can learn the features from the input data without using a physical model, so there is no need to understand their internal physical processes. The artificial neural network has strong self-adaptation and self-learning capabilities and can continuously update the parameters in the model to make the output value closer to the real value, so it is used as a soil temperature prediction model usually [2022]. Bilgili proposed a method based on the artificial neural network to predict monthly average soil temperature [23]. Mehdizadeh et al. used the model based on feedforward backpropagation neural networks (FFBPNN) and gene expression programming (GEP) to estimate the daily soil temperature at different depths [24]. When the traditional artificial neural network is used to predict the soil temperature, the accuracy of the output results is reduced because the correlation of the time series is not considered. For solving this problem, many researchers merge genetic algorithms into artificial neural networks to optimize neural networks [20, 25, 26]. However, the genetic algorithm is not efficient and prone to premature convergence. Therefore, this method still needs further research.

Deep learning methods are widely used to deal with time series data. Compared with recurrent neural networks (RNN) and LSTM, GRU has a simpler structure and can solve the problem of long-term dependence [27]. Therefore, this paper chooses the model based on the GRU network to predict soil temperature. The model uses hidden states to convey the information and process the relevance of time series data. It is widely used in many fields due to its special network structure. Liu et al. proposed a method, GRU-based nonlinear predictive denoising autoencoders for fault diagnosis of rolling bearing [28]. Miau and Hung designed the Conv-GRU model to estimate the water level, and the results show the effectiveness of the method [29]. Rui et al. combined GRU and LSTM models to predict traffic flow [30]. According to our research, the GRU network had not been used for soil temperature prediction.

The following questions are the focus of this article. The first one is how to choose the input data to Ts estimation model. The estimation of Ts is affected by the past Ts. Although the relevant meteorological data have some impact on Ts estimation, the accuracy of the model for Ts estimation will be affected by the errors between the provided data and the real data. Consequently, this paper concentrates on the time series of data. The other one is about the network model construction of the method in our paper. In the GRU network, the information is transferred by updating the cell state and the parameters in the hidden state. As the steps of the time series increase, the correlation between the initial data and the output data will be decreased, which will lead to a decrease in prediction accuracy.

The motivation of this paper is to solve the problem of long-term serial dependence of soil temperature data, which leads to a decrease in prediction accuracy. With the goal, this paper proposed a new embedded estimation model based on the GRU network for Ts estimation. In order to obtain the global characteristics of soil temperature, we connect the previous steps to the output layer directly, and the local characteristics of soil temperature are obtained through the following steps. We set different steps to the channels; with the outputs as cells, the estimation result is calculated by fully connecting different cells, using the past Ts data from the Laegern and Fluehli stations in Switzerland from 2006 to 2014 to estimate Ts in the next 6 hrs, 12 hrs, and 24 hrs at different soil depths (5, 10, and 15 cm).

The main contributions of this paper for Ts estimation are listed as follows:(1)According to our research, the GRU network had not been used for soil temperature prediction yet, and the method based on GRU was achieved in this paper for the purpose of estimating soil temperature.(2)In order to obtain the global characteristics of soil temperature, we connect the previous steps to the output layer directly, and the local characteristics of soil temperature are obtained through the following steps.(3)As the results showed, our method has a better performance than the other advanced technology available.

2. Materials and Methods

2.1. The Framework for Soil Temperature Estimation

First of all, we extract the corresponding past Ts data from the Laegern and Fluehli stations on FLUXNET as the input to our model. Meanwhile, we consider several other models based on machine learning technology models (LSTM, BPNN, ELM, and GRU). In our model, we connect the previous steps to the output layer directly to predict Ts. Finally, we compare the results by several statistical evaluation criteria (RMSE, MAE, MSE, and ) to evaluate the performance of the model. Figure 1 shows the overall structure of our study.

2.2. The Structure of GRU

The GRU network has the characteristics of simple structure and fast training speed and can transmit relevant information to the time series for prediction. It is widely applied in many fields due to its advantages precisely. The GUR can solve the time series problem and the gradient problem in backpropagation. The GRU unit structure is shown in Figure 2. It has two gates, which are the reset gate and the update gate. The update gate decides which new information should be discarded and added. The reset gate is used to decide how much past information to forget.

The calculation formulas of the GRU are as follows:where represents the input value at the current moment and is the hidden state of the previous node and uses them to get the gate status. is the reset gate, is the reset gate, represents the new memory, represents the hidden state, is the output of the output layer, is the sigmoid activation function, and is the output tangent function.

2.3. The Structure of Our Model

Through the previous analysis, when there are a large number of cells in the GRU network, the correlation between the features will decrease with the time series extending. Therefore, we proposed the model based on GRU is to solve the problem and improve the accuracy of the estimation model, as the topological structure is shown in Figure 3.

Our model network is composed of the traditional GRU network and the auxiliary networks. The information is updated to the next cell through the parameter backpropagation of the hidden state in the GRU model that would decrease the correlation with the earlier. The traditional GRU network is used as the basic network to obtain local features, and the auxiliary network is composed of the output of different steps to obtain global features, meanwhile, merging the features as the output of the entire network model. Input the past Ts data into our model to learn the pattern of periodic changes in Ts, which can enhance the correlation between past Ts data and improve the accuracy for the prediction.

We proposed the final output of our model is at the time step , which combined all the channels to the fully connected layer, as follows:

2.4. Objective Function Optimization Algorithm

Adaptive moment estimation algorithm has excellent performance with high computational efficiency and low memory requirements [31]. It can calculate different adaptive learning rates for different parameters. The method is suitable for processing large-scale data and optimizing parameters, as well as solving sparse gradient problems. The Adam is widely used in the field of deep learning, and it is used to optimize the model in this paper. This article uses the mean square error to optimize our model, and the calculation formula is as follows:where the observed data from the stations is and the output of our model is .

Learning rate is an important parameter in deep learning. The value of the learning rate will affect the convergence of the function. When the learning rate is set too small, the convergence will be very slow. Meanwhile, when the value is set too large, the gradient will be affected. We use the method of exponential decay learning rate to improve the convergence of Adam and the method of exponential decay learning rate. By constantly adjusting the learning rate, the step size is set to 100 and the attenuation rate is 0.96; the algorithm is close to the optimal solution.

2.5. Model Training and Test

In this paper, the past Ts data (the Laegern and Fluehli stations in Switzerland from 2006 to 2014) is served as the input to our model for estimating Ts and using TensorFlow backend. The model is tested on Intel Core (TM) i7-5820K, 3.30 GHz CPU, and 64 GB memory running Pycharm 2018. We use three-quarters of all data as training samples (data during 2006.1.1–2013.3.14), and the others were used as testing samples (data during 2013.3.15–2014.12.31). We assume that the value at the t point in the time series is , which is predicted by the first t − 1 elements; use half of the daily soil temperature to predict the values of Ts in the following 6 hrs, 12 hrs, and 24 hrs; and set the value of t to 24. This paper compared our model with the other models (including ANN, LSTM, ELM, and GRU), meanwhile, calculating several evaluation criteria (RMSE, MAE, MSE, and ) to estimate the model performance, as follows:where the total number of data is denoted as N, is the observed value of data at the moment, is the predicted value obtained through different methods, is the observed average value of the data. According to our knowledge, we can understand the fitting degree of the model and the accuracy of data prediction through evaluation criteria. With the smaller value of RSME, MAE, and MSE and the larger value of , the model will show the best performance.

2.6. Study Area and Field Experiment

This paper studied the data from two stations (Laegern 47.48 N, 8.37 E, Fluehli 46.88 N, 8.01 E) located in Switzerland and downloaded the past Ts data within half an hour on FLUXNET (https://fluxnet.fluxdata.org/) to verify our model. Since these two stations are located in their domestic ecological nature reserve, Ts has a certain impact on the surrounding ecological environment, such as plant growth, soil fertility, and microbial activities, as shown in Figure 4.

This article takes the past Ts data from the stations as the input. With the data provided by the stations, it can be seen that the depth at the 15 cm soil temperature of the Laegern station is the most stable, achieving the minimum of temperature differences and standard deviation; is the minimum value; is the maximum value; is the average value; represents the standard deviation; represents skewness; and represents variation coefficient which is shown in Table 1.

3. Results and Discussions

In this paper, comparing our model with the other four models (BPNN, LSTM, ELM, GRU) for estimating Ts at the two stations’ data, use the Adam to optimize the model and experiment with scikit-learn. For instance, input layer, hidden layer, and output layer consist of the ANN. We set the batch size to 10000, the number of iterations to 100, the learn rate to 0.03, and the number of nodes to 32, and the model get the best performance. We apply the elm function to the ELM model, the activation function in the hidden layer is sigmoid, and the number of nodes is set same to ANN. We Set the same hyperparameters to the GRU and our model so that it can be useful to show the performance of our model.

3.1. Evaluation for the Hyperparameters in Our Model

According to our research, the value of the hyperparameters has a certain influence on the performance of the model. As the hyperparameters, what we mentioned are the number of channels, iterations, learning rate, and the number of nodes (). For example, we estimate the soil temperature at the depth of 5 cm for 6 hrs of the Laegern station. As the results showed, has a certain impact on the fitting of the model and the acquisition of relevant important information. With the learning rate set to 0.03, the number of nodes is 32, the number of channels is set to 4, the number of iterations is set to 100, and our model has the best performance.

When the model is overfitting during the training process, it is not conducive to the model adapting to the changes of the data. In contrast, if underfitting occurs during training, it is not conducive to data mining of related data. Moreover, due to the large learning rate value for optimal weights, the predictive model can easily be trapped into local optimum during the learning process. Then, when the learning rate value is small, it will make the parameters hardly converge to the optimal value for training the predictive model. The results are shown in Table 2, as follows:

3.2. Evaluation for Different Models

Comparing our model with the other four models (BPNN, ELM, LSTM, and GRU) in this part. The inputs to all the predictive models were the past Ts data from the stations. The output of the model was estimated Ts values in the following 6 hrs, 12 hrs, and 24 hrs.

The predicting results of the five models at the depths of 5, 10, and 15 cm in the following 6 hrs, 12 hrs, and 24 hrs are experimented for the Laegern station as shown in Table 3. For the depth of 5 cm results in the coming 6 hrs, our model has better performance than the other models. According to the results, our model is compared with other models, and the percentage of the RMSE has reductions of 35.4% (GRU), 38.9% (LSTM), 39.1% (ELM), and 53.4% (BPNN), respectively; the percentage of the MAE has reductions of 42.9% (GRU), 46.2% (LSTM), 45.4% (ELM), and 58.2% (BPNN); and the percentage of the MSE has reductions of 57.3% (GRU), 58.6% (LSTM), 58.8% (ELM), and 75.9% (BPNN), getting the R2 score amount to 0.9638 (our model) compared to 0.9034 (LSTM), 0.9022 (ELM), 0.8319 (BPNN), and 0.9058 (GRU). Because our model has the lower value of RMSE, MAE, and MSE, meanwhile, having the higher value of , it is obvious that our model has a very excellent performance on Ts estimation. However, for the depth of 5 cm results in the following 12 hrs and 24 hrs, we can draw the same conclusion that our model performance is still stronger than the other four models. The accuracy has been continuously improved from 5 cm to 15 cm of the soil depths, but the accuracy in the following 6 hrs to 24 hrs has been decreased. It may be caused by systematic errors when the model makes long-term predictions [32]. However, the ELM model gets a superior level of accuracy compared to the others models for the depth of 5 cm in the following 12 hrs, the depth of 10 cm in the following 6 hrs, and the depth of 15 cm. The reason for this might be ELM is a feedforward neural network architecture in which parameters are randomly chosen [19]. Sometimes a nonoptimal solution may be generated, which will affect the performance of the model.

We estimate the soil temperature at the depth of 5 cm, 10 cm, and 15 cm of the Laegern station in the following 6 hrs, 12 hrs, and 24 hrs, respectively. The estimation results are at the depth of 5 cm, 10 cm, and 15 cm of the Laegern station in the following 6 hrs, 12 hrs, and 24 hrs. The linear relationship between the estimated value and the observed value is shown in Figure 5. According to the distribution of the scatter plot, it can be seen that the linear relationship of our model is closer to the ideal line (y = x), and its value is higher than the others. It is shown in Figures 5(a) and 5(b), for instance, that the higher value of in our model is 0.9638 for the depth of 5 cm in the following 6 hrs, and the linear relationship is y = 1.0063x + 0.0438. However, all the tested models get a good performance when we estimate at the depth of 15 cm in the following 6 hrs and 12 hrs, which is shown in Figure 5(c). That is because the estimated value and the observed value have better consistency in this case. As the time series extends (24 hrs), our model still maintains good accuracy, and the accuracy of other models starts to decrease. Above all, the experimental results showed that our model has a certain degree of robustness for long-term estimation.

The frequency plot of the absolute estimation error is shown in Figure 6, and each bar indicates the error percentage. It can be obviously seen that our model has the highest frequency for the smallest error magnitude encountered in estimating Ts of the Laegern station. For example, for the depth of 5 cm in the following 6 hrs, our model has a higher value of frequency (73.8%) compared to the other four models (46.5% (GRU), 45.9% (LSTM), 46.6% (ELM), 36.8% (BPNN)).

Using different predictive models to test the data separately from the Fluehli station at the depths of 5 cm, 10 cm, and 15 cm in the following 6 hrs, 12 hrs, and 24 hrs, the results are shown in Table 4. The results noted that our model mainly gets better performance than the others; for example, the proposed model achieved excellent results (RMSE = 0.6534, MAE = 0.3928, MSE = 0.4735, and  = 0.9860) compared to the other four models at the depth of 5 cm in the following 6 hrs at Fluehli station. Generally speaking, our model mainly has superior performance for estimating Ts with the experiments in different regions, different times, and different soil depths.

4. Conclusions

Soil temperature (Ts) as the important variable is one of the land surface features impact on Earth science, usually used in many research fields; for example, it affects the growth and development of plants and the formation of soil. In this study, we research the performance of backpropagation neural networks (BPNN), gated recurrent unit (GRU), extreme learning machine (ELM), long short-term memory (LSTM) network, and our model for estimating Ts at the depth of 5 cm, 10 cm, and 15 cm in the following 6 hrs, 12 hrs, and 24 hrs over the Switzerland Laegern and Fluehli stations. The statistical results indicated that our model mainly performs better than the other four models on the Ts estimation.

In order to reduce the influence of long-term series on the accuracy of soil temperature estimation and obtain the global characteristics of soil temperature, we connect the previous steps to the output layer directly, and the local characteristics of soil temperature are obtained through the following steps.

The soil temperature is affected by many factors, such as atmospheric temperature and precipitation. EEMD can decompose time series into signals of different frequencies and display the time series values with distinguishing ability under different frequencies. This method may decompose the complex soil temperature time series into discriminative data. Therefore, we will attempt to integrate EEMD to decompose the soil temperature time series and the features with different frequencies into the model to improve estimation accuracy.

Data Availability

The data included in this paper are available without any restriction.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (NSFC) under Grant no. 51805203, the Science and Technology Development Plan of Jilin Province 20190201023JC, and the Development and Reform Commission of Jilin Province (2019C054-2).