Abstract

Six machine-learning approaches, including multivariate linear regression (MLR), gradient boosting decision tree, k-nearest neighbors, random forest, extreme gradient boosting (XGB), and deep neural network (DNN), were compared for near-surface air-temperature (Tair) estimation from the new generation of Chinese geostationary meteorological satellite Fengyun-4A (FY-4A) observations. The brightness temperatures in split-window channels from the Advanced Geostationary Radiation Imager (AGRI) of FY-4A and numerical weather prediction data from the global forecast system were used as the predictor variables for Tair estimation. The performance of each model and the temporal and spatial distribution of the estimated Tair errors were analyzed. The results showed that the XGB model had better overall performance, with R2 of 0.902, bias of −0.087°C, and root-mean-square error of 1.946°C. The spatial variation characteristics of the Tair error of the XGB method were less obvious than those of the other methods. The XGB model can provide more stable and high-precision Tair for a large-scale Tair estimation over China and can serve as a reference for Tair estimation based on machine-learning models.

1. Introduction

Air temperature (Tair) is one of the basic meteorological observation parameters [13] and is of great concern in scientific disciplines like hydrology, meteorology, and environmental science. Furthermore, it influences most land-surface processes, such as photosynthesis and land-surface evapotranspiration [4]. Obtaining high-resolution Tair data can reduce human health risks and promote urban heat island research, so high-resolution Tair information is quite crucial [5, 6]. The summer Tair value in China is generally above 20°C, except in the high-altitude regions (e.g., Qinghai-Tibet Plateau). Summer heat waves have a major impact on agricultural food production, as well as the use of water and electricity [7]. This study focuses on the issue of summer Tair estimation in China using Advanced Geostationary Radiation Imager (AGRI) data.

Large-scale Tair data are mainly obtained by interpolation from the data collected by surface meteorological stations. However, the distribution of meteorological stations is usually uneven due to geographical factors, and some sparsely populated areas even have no meteorological observation [8]. Therefore, the accuracy of the interpolated Tair data is limited, and researchers are unable to obtain high-spatial-resolution Tair information [9].

Meteorological satellites such as low-Earth-orbit (LEO) satellites and geostationary-Earth-orbit (GEO) satellites can provide continuous surface (i.e., land-surface temperature (LST)) and atmospheric observations with a wide spatial coverage at global and regional scales [1012]. In the last several decades, LEO and GEO observations have been gradually applied to Tair estimation with the development of meteorological satellite technology. LEO satellites can only acquire data once or twice a day for one place. In addition, cloud contamination will reduce the effective data for Tair estimation [1315]. Unlike LEO satellites, GEO meteorological satellites can continuously provide data every 15 or 30 min on one-third of the Earth’s surface [1620]. Therefore, GEO satellites comprise an effective method of obtaining high-spatial- and high-temporal-resolution Tair data in a fixed area and have the potential to facilitate the study on the daily change of Tair [20, 21].

At present, the methods for Tair estimation from satellite brightness temperatures (BTs) and land-surface temperature (LST) product data can be divided into simple linear, multivariate linear, and nonlinear approaches [21, 22]. Previous studies [7, 23, 24] have shown that machine-learning algorithms can obtain higher-accuracy Tair values than those in other methods. For example, a machine-learning model (e.g., a neural network model (NN)) has higher accuracy, and the root-mean-square error (RMSE) is reduced by 1.29°C compared with linear models [7].

The AGRI aboard Fengyun-4A (FY-4A) has 14 spectral bands [18, 20, 25, 26]—six visible/near-infrared (VIS/NIR), six infrared (IR), and two water vapor bands—with a temporal resolution of 15 min for the full disk and a spatial resolution of 4 km at IR bands. It provides an unprecedented opportunity for obtaining high-precision Tair data over China and surrounding areas.

Machine-learning methods are used to estimate Tair based on moderate-resolution imaging spectroradiometer (MODIS) data in several studies [2729]. However, there is currently a lack of relevant studies on Tair estimation based on FY-4A. The use of FY-4A data to estimate high-resolution Tair is of great significance to the study of human health and high-temporal- and high-spatial-resolution Tair in East Asia. In addition, there is a need for timely and high-resolution Tair data for the sustainable planning and management of climate-resilient cities [3].

This study aims to develop the machine-learning approaches for Tair estimation using FY-4A data and compares the performances of different machine-learning models [i.e., multivariate linear regression (MLR), gradient boosting decision tree (GBTD), k-nearest neighbors (KNN), random forest (RF), extreme gradient boosting (XGB), and deep neural network (DNN)] in Tair estimation, which, to the best of our knowledge, has never been done before. By comparing different machine-learning algorithms, a machine-learning algorithm with good applicability for estimating Tair is selected. The algorithm is widely applicable to meteorological satellites without surface-temperature products.

The remainder of this paper is organized as follows. In Section 2, the study area and data used for model development are introduced, and the construction of the above-listed six machine-learning models for Tair estimation is described. Variable importance analysis, validation results, and discussion are described in Section 3. Conclusions are presented in Section 4.

2. Materials and Methods

2.1. Study Area

The study area is located in China, and Figure 1 shows the spatial distribution of 1,812 meteorological stations used in this study. There is a higher altitude in the West over China than in the East, and even the Qinghai-Tibet Plateau has an average elevation of over 4,000 m [30]. There are more stations in the East areas than in the West ones due to the uneven distribution of population and economic development in China (Figure 1).

2.2. Data

The data used in this study mainly include FY-4A/AGRI brightness temperature (BT) and L2 cloud mask data, global forecast system (GFS) 3 h forecast data, meteorological data of 1,812 stations in China, and other auxiliary data (longitude, latitude, and Julian day).

2.2.1. Satellite Data

FY-4A, the new generation of Chinese geostationary meteorological satellites, was launched on December 11, 2016. It was fixed at a position of 99.5°E above the equator. As thermal infrared split-window channels, the 12 and 13 bands of AGRI (BT12 and BT13, respectively) are mainly used for studies of cloud, aerosol, and Tair estimation. Their central wavelengths are 10.8 and 12.0 μm [31].

BT12, BT13, and L2 cloud mask products during Summer 2018 (i.e., June, July, and August) were used. The ARGI data were selected at 3 h intervals (i.e., 00, 03, 06, 09, 12, 15, 18, and 21 UTC) per day. The data were downloaded from the China National Satellite Meteorological Center (http://satellite.nsmc.org.cn/PortalSite/Data/Satellite.aspx).

2.2.2. Meteorological Data

This study selected meteorological data at 3 h intervals from 1,812 observation stations in China during summer 2018. The meteorological variables used in this study include Tair and the digital elevation model (DEM). Tair in summer 2018 ranges from −5°C to 40°C, and the DEM of the station was between 0 and 5000 m. These data were obtained from the China Meteorological Data Service Center (CMDC) (http://data.cma.cn/).

2.2.3. Numerical Weather Prediction Data and Auxiliary Data

Previous studies showed that the relationship between BTs (or LST) and Tair is easily affected by surface characteristics and atmospheric conditions [7, 31]. Therefore, the accuracy of Tair estimation was effectively improved by adding several auxiliary parameters [32]. In this study, GFS 3 h precipitable water vapor (GFS PWV) and relative humidity (GFS RH) forecast fields data were used. The forecast length of the GFS data (GFS PWV and GFS RH) used was 3 h per day, and there were eight periods of data per day (i.e., 00, 03, 06, 09, 12, 15, 18, and 21 UTC). The GFS data were interpolated according to the location and time information of the AGRI pixels. GFS data were obtained through the U.S. National Oceanic and Atmospheric Administration (NOAA) National Centers for Environmental Prediction (http://www.nco.ncep.noaa.gov/pmb/products/gfs). Table 1 presents the temporal and spatial resolution information of the data used in this study.

2.3. Methods
2.3.1. Preparation of Training Dataset

The BT12, BT13, GFS PWV, GFS RH, and auxiliary data were used as the input variables, Tair was used as the response variable of the machine-learning models (Table 1), and all data points (across space and time) were included in one model (i.e., the XGB model) [33]. The construction of the representative training data was crucial to develop successful retrieval models using machine learning. Thus, data from June to August—except the 1st, 10th, 20th, and 30th of each month—were collected as the original dataset, and the original dataset was randomly divided into a training dataset (80%, 97,1773 samples) and a test dataset (20%, 24,2944 samples) with the same number of pieces of data for each bin (i.e., 1.0°C in temperature) as shown in Figure 2. For the validation, the data that were not used for training were selected from June to August 1st, 10th, 20th, and 30th.

2.3.2. Machine-Learning Algorithm

Machine-learning methods have been widely used in classification and regression in the field of remote sensing [3441]. In this study, six machine-learning approaches, that is, MLR, GBTD, KNN, RF, XGB, and DNN, were used for constructing Tair estimation models. The flowchart of Tair estimation based on machine-learning approaches is shown in Figure 3. L2 cloud mask products were used to detect cloud. If the data were cloudless, FY-4A data matched both the GFS data and meteorological station data (same space and time), and then Tair was estimated through the machine-learning models.

As a simple machine-learning algorithm, MLR has usually been the basic tool for the estimation of meteorological parameters [42, 43]. Similarly, as a local nonlinear algorithm, the prediction process of KNN is generally divided into two steps. First, when the KNN algorithm predicts a point, it searches for the k-nearest neighbors closest to the point in the training dataset. Second, the mean of the target variable of the k-nearest neighbors is computed [44, 45]. In this study, the hyperparameters of MLR and KNN were set to default values. Unlike MLR and KNN, RF is an ensemble to a decision-tree-based approach for improving the prediction accuracy, such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest [34, 43, 4650]. The Scikit-learn library was used for hyperparameter tuning named GridSearchCV from Python to filter the hyperparameters including number of trees (n_estimators), minimum number of samples (min_samples_leaf), and maximum depth of a tree (max_depth). The result of parameter selection is n_estimators = 200, min_samples_leaf = 50, and max_depth = 3.

The principle of GBTD is to sequentially apply a classification algorithm to the weighted version of the training data [51, 52], descending along the gradient direction of the model loss function previously established, and then perform a weighted majority vote on the resulting classifier sequence. As an improved algorithm of GBTD, XGB uses all data in each iteration, which is similar to RF [53, 54]. Therefore, XGB reduces the complexity of the model and makes the learned model simpler [35, 5458]. In this study, four hyperparameters in GBTD and XGB models (i.e., n_estimators, max_depth, learning_rate (lr), and minimum loss reduction) required to make a further partition on a leaf node of the tree (gamma) were empirically tuned based on RMSE. The optimum n_estimators, gamma, max_depth, and lr in the two models were 500, 0.2, 5, and 0.1, respectively.

An artificial neural network (ANN) is a biologically inspired machine-learning method [59]. Here, DNN, a subset of ANN with multiple hidden layers, uses a fully connected structure, which has the ability to learn time and space relationships [60, 61]. It adjusts the connection strength through back-propagation and minimizes the prediction error by iterating between neurons [6264]. Each hidden layer was tested in the DNN model at one to five hidden layers and 5–200 neurons in five intervals. In addition, some widely used optimizers (i.e., stochastic gradient descent, RMSProp, and Adam) were tested by comparing the calculated results. In this study, the hyperparameters of the DNN were set as follows: batch_size, 128; dropout_rate, 0.1; stop_steps, 20 (if the validation-set loss function was not improved within 20, training will be terminated); and learning rate, 0.001. The optimizer chose Adam, the number of hidden layers was three, and the number of hidden neurons was 256.

2.4. Error Analyses

Four statistical factors—determination coefficient (R2), RMSE, MSE, and mean bias (bias)—were used to evaluate the accuracy of Tair estimation model as follows:where Tea is the estimated Tair, Toa is the observed Tair at the meteorological stations, and N is the sample size.

3. Results and Discussion

In this section, the results of variable importance were presented, and the performance of the six machine-learning models was verified. The spatial distribution characteristics of the Tair errors of each model were also analyzed.

3.1. Variable Importance Results

Correlation analysis was performed to analyze the linear relationship between Tair and BT12, BT13, GFS PWV, GFS RH, DEM, longitude (LONG), latitude (LAT), and Julian day (JD). Table 2 shows the correlation coefficient matrix of these variables.

As described in Figure 4(a), GFS PWV, DEM, BT12, and BT13 had a better correlation with Tair than other variables, and the R values of the four variables were 0.635, −0.596, 0.459, and 0.413, respectively. This indicated that these variables played more important roles in the linear Tair estimation models. However, the Pearson correlation coefficient only described the linear correlation between two variables; it could not identify the nonlinear relationship between two variables. Therefore, the variable importance of the RF algorithm was also analyzed (Figure 4(b)). The RF algorithm modeled the nonlinear relationship well. The GFS PWV was identified as the most important variable for Tair estimation in the RF model, while the GFS RH and BT12 also played important roles than other predictors. Therefore, PWV and RH were used as inputs to effectively improve the accuracy of Tair estimation, which was consistent with the previous study [65].

3.2. Model Performance Results

For evaluating the overall performance of each model, a 10-fold cross-validation method was used. K-fold cross-validation was used for model configuration selection. When a particular value of K was selected (where K was 10), the datasets were randomly and equally distributed among K groups. One group was folded for test, and the K − 1 group was folded for training. In a total of k validations, the model performance was calculated using different test folds for each validation [35]. Finally, the average validation results were used to evaluate the overall performance of each model.

Figure 5 illustrates the six models with different statistical parameters, including RMSE, Bias, MSE, and R2. The MLR model had the lowest performance of the six models. The variation range of RMSE, Bias, MSE, and R2 in the MLR model was quite wide; even the range of RMSE was 1.602°C–4.487°C, while the DNN model used in this study had better overall performance and higher efficiency than the other five models. The DNN model showed the highest accuracy, with an average RMSE of 1.736°C. The range of RMSE in the DNN model was 0.852°C–2.584°C, showing good concentration and stability, as presented in Figure 5(a). In addition, the overall performances of the XGB and GBTD models of the remaining models were equivalent, which were better than those of the MLR, KNN, and RF models.

3.3. Validation Results

Model performance was used as an indicator to internally validate each model. The model accuracy must be evaluated with a dataset that was not used for training or testing. To validate the developed MLR, RF, KNN, GBTD, XGB, and DNN models, the observed data not used for both training and testing were utilized (validation dataset in Section 2.3.1). Figure 6 illustrates the quantitative validation results of the estimated Tair during the validation time (the 1st, 10th, 20th, and 30th of June–August 2018). Compared with the results in the test dataset, the overall accuracy of the six models on the validation dataset decreased. For example, for the DNN model, the RMSE of Tair using the test dataset was 1.736°C, while that of the validation results was 2.006°C. This difference may be caused by overfitting due to the fact that the best model was not selected based on the final validation results [35].

The biases of the MLR, RF, DNN, GBTD, and XGB models were within ±0.2°C, indicating no obvious overestimation or underestimation. In contrast, the KNN model showed a larger negative bias of −0.492°C. The reason that the KNN model had a larger negative bias may be that it had poor robustness. Robustness mainly depended on the dataset, and poor robustness made the model difficult to directly apply to other cases, so the KNN model had a low bias on the test dataset and a high bias on the validation dataset.

The XGB model had excellent modeling performance with R2 of 0.902. The R2 values of the GBTD and DNN models were 0.898 and 0.890, respectively, and the R2 value of the remaining three models was less than 0.89. Moreover, compared with the other models, the XGB and GBTD models can repeatedly learn to generate a weighted average of the weak learners. Therefore, the XGB and GBTD models showed a relatively better performance in the validation dataset in most sites. In general, the XGB model showed a higher overall performance than the other five models on the validation dataset.

The Tair estimation models based on satellite and numerical forecast data are susceptible to factors such as altitude and surface roughness. To further evaluate the applicability of these models, the spatial distribution of each meteorological observation was evaluated (Figures 79).

It can be seen that the Tair estimation errors of all models showed obvious spatial distribution characteristics (Figure 7). Generally, the RMSE is relatively low in the eastern regions (e.g., Guangdong Province) and high in the northwestern regions for each model (e.g., Xinjiang Province). For example, the RMSE in Guangdong Province of the XGB model was approximately 1.2°C–1.8°C, while that in Xinjiang Province was about 2.0°C–3.2°C. Because the northwestern regions have relatively wide Tair changes during day and night, high altitude, and few meteorological observations, the accuracy difference between northwestern and eastern China is obvious. Moreover, the RMSE of the KNN, DNN, GBTD, and XGB models was relatively low in the eastern and southern regions. However, the MLR, RF, KNN, and DNN models had a higher RMSE in northwestern China. In contrast, the GBTD and XGB models had a relatively smaller RMSE in northwestern China because the GBTD and XGB models can generate repeated weighted averages to adjust the applicability of different regions through repeated learning of numerous data.

Furthermore, Gong’s study (2015) [66] illustrated that the RMSE of GFS Tair in most eastern regions reaches 1.5°C–3.0°C and was above 3.5°C in the northwestern regions. By contrast, the results showed that the RMSE of Tair estimated by the DNN, XGB, and GBTD models was obviously lower than that of GFS data. In the present study, the RMSE of the XGB model was 1.0°C–2.0°C in most eastern regions, and it was below 3.5°C in the northwestern regions. In addition, RMSE < 2.0°C accounted for 48.2% and RMSE < 2.5°C accounted for 87.6% in the XGB model.

The six models showed the same distribution trend as shown in Figure 8, with R2 being higher in the eastern regions, but R2 gradually became lower as it got closer to the southwestern regions. Compared with the central regions (e.g., Henan Province), the viewing zenith angle (VZA) of ARGI over the western China is larger. The larger the VZA is, the more the radiation reaching the sensor will be highly affected by the atmosphere, which may cause differences in R2 of the estimated Tair value between the southwestern and central regions.

For the MLR model, the bias for all of China was large. For the RF and KNN models, relatively high negative bias existed in southwestern China (e.g., Yunnan-Guizhou Plateau), as shown in Figure 9. This may be the relatively simple structure of the three models mentioned above, which cannot well simulate the complex Tair changes in China, resulting in underfitting. Besides, Tair estimated by the DNN model was overestimated in northwestern China, which was the reason that the RMSE in the DNN model was also high in these regions. In contrast, the GBTD and XGB models had relatively low bias in northwestern China, where the absolute bias ranges from 2.0°C to 3.0°C. In conclusion, the bias is lower in the coastal areas and higher in northwestern areas, which is mainly related to the characteristics of Summer Tair change.

Figure 10 shows the time series of RMSE for the six models during the validation period. The RMSE of the MLR model was significantly higher than other models, with the RMSE ranging from 2.5°C to 4.3°C. In contrast, the RMSE of the GBTD and XGB models showed a relatively lower RMSE (i.e., 1.8°C–2.2°C) than that in the RF, KNN, and DNN models.

Based on the above analysis, it is expected that the XGB model can provide a more reliable and accurate Tair estimation than other models. For purposes of evaluating the contribution of predictive factors in the XGB model to Tair estimation, BTs data (BT12 and BT13) and GFS data (GFS PWV and RH) were successively introduced (Table 3). As shown in Table 3, DEM, longitude, latitude, and Julian day were used as input variables, and the RMSE of the XGB model was 3.003°C. The accuracy of Tair estimation was obviously improved when BT12 and BT13 were included in the model. Moreover, when GFS PWV and RH were added to the input variables, the RMSE of the XGB model was decreased to 2.164° C, indicating important influences of GFS PWV and RH on the Tair estimation. These results are understandable due to the fact that PWV and RH are the main parameters needed for atmospheric correction and LST retrieval. The RMSE of XGB model was improved by 0.228° C compared with just GFS data which were introduced when both AGRI BTs and GFS data were introduced to the input variables. This indicates that both GFS data and satellite observation data have an important role in improving the Tair estimation model. The RMSE of Tair estimation model was less than 2.0° C when both satellite BTs and GFS data were introduced, which was considered to be the precision level of “accurate” [67].

The relationship of XGB model errors with altitude, observed Tair, and VZA was analyzed. Figure 11 demonstrates the scatter plot of the estimated Tair error with DEM, Tair, and VZA. It can be seen that the Tair error mainly ranges from −3°C to 3°C. The results showed positive deviation at high-altitude areas, which produced a larger RMSE than low-altitude areas. The model showed a positive deviation when Tair was low while exhibiting a negative bias for the high-air-temperature condition. Therefore, the model showed a larger RMSE in the lower- and higher-air-temperature conditions due to underestimation and overestimation. This is similar to the results of previous studies [38]. Furthermore, the uneven distribution of stations makes the applicability of the model in high-altitude areas poor. It is worth mentioning that the effect of VZA on model performance is negligible as shown in Figure 11(c).

4. Conclusions

In this study, six machine-learning approaches (MLR, RF, KNN, DNN, GBTD, and XGB) for Tair estimation from FY-4A AGRI data in China were compared and analyzed in terms of the spatial and temporal characteristics of their performance. The validation results highlighted the high potential of Tair estimation approaches using machine learning and showed that the accuracy of the XGB model was better than that of the MLR, RF, KNN, GBTD, and DNN models at most sites for Tair estimation over China. The validation was performed using spatially and temporally independent data, and hence the model performance was considered to be quite reliable.

This study improves on previous studies in the following key areas. First, Tair estimation models were constructed based on FY-4A AGRI data and other auxiliary data. The results showed that high-temporal- and high-spatial-resolution Tair values (RMSE <2.0°C) can be obtained based on FY-4A data. According to the study of Vazquez [67], the level of precision generally accepted as “accurate” for remote-sensing-based Tair estimation is between 1°C and 2°C. Second, the accuracy and performance of the six machine-learning models (MLR, RF, KNN, XGB, GBTD, and DNN) were compared and analyzed. The results showed that the XGB model can provide more stable and high-precision Tair estimation, which provides a reference for Tair estimation based on machine-learning models. Finally, the accuracy of Tair estimation based on satellite data can be effectively improved by adding a numerical model of Tair. The experimental results showed that only satellite data were used for large-scale Tair estimation in China, and the RMSE of the XGB model was 2.376°C, but the RMSE using satellite data combined with numerically modeled Tair data reached 1.946°C.

However, aside from the novelties of this study, the limitation of the dataset used is the restriction to clear-sky conditions. Similarly, machine-learning algorithms cannot infer beyond the range of observed Tair value. If the Tair value increases beyond the range that cannot be observed within the current training period, the model must be retrained. Moreover, future research may explore whether adding other predictors, such as distance-to-coast and vegetation information (normalized difference vegetation index, etc.), can improve the accuracy of the Tair estimation models.

Data Availability

FY-4A AGRI data were downloaded from the China National Satellite Meteorological Center (NSMC) (http://satellite.nsmc.org.cn/PortalSite/Data/Satellite.aspx). The GFS data were obtained from the National Centers for Environmental Prediction (https://www.nco.ncep.noaa.gov/pmb/products/gfs/). The meteorological station data were accessed at the China Meteorological Data Service Center (CMDC) (http://data.cma.cn/).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (41527806) and the National Key Research and Development Program of China (2016YFA0600101). The authors would like to thank National Satellite Meteorological Center (NSMC) for providing FY-4A data, China Meteorological Data Service Center (CMDC) for the meteorological data, and National Centers for Environmental Prediction (NCEP) for the GFS data.