A novel method for optimizing air temperature estimation and quantifying canopy layer heat island intensity in eastern and central China
Introduction
In the last decades, the world experienced notable climatic change arising from rapid urbanization, which affects regional climate, vegetation phenology and human health (Li et al., 2020a, Grimm et al., 2008, Zhang et al., 2004, Harvell et al., 2002). Air temperature as one of the key variables for investigating climate change, has attracted extensive attention (Ren et al., 2007a, Ren et al., 2007b, Ren and Zhou, 2014, Wen et al., 2013, Li and Zha, 2019). Especially in the eastern China, five hottest summers all occurred in the twenty-first century and mean temperature increased by 0.82 °C in summer (June-August) since 1950 s (Sun et al., 2014), causing substantial economic and societal impacts. Central China also experienced warming with a range of trends of −0.3 °C to 1.0 °C/100 years during 1909–2010 (Cao et al., 2013). Unfortunately, these studies were generally affected by sparse weather stations that only provide limited information about the spatial distribution of air temperature. Therefore, accurate air temperature estimation over wide areas is of great significance for scientific research and public services.
There are many promising studies in the past to estimate air temperature, mainly including interpolation, simulation and regression (Yang, 2004, Taheri-Shahraiyni and Sodoudi, 2016, Yao et al., 2020, Li and Zha, 2018). As a simple and effective method, regression model facilitates the estimation of air temperature whether at urban-scale or at national-scale with low estimation errors (usually 1–3 °C). Among them, random forest (RF) regression model has great potential in air temperature estimation due to its strong robustness and high accuracy. For instance, RF regression model produce the lowest estimation error for estimating air temperature in comparison with ordinary least squares regression and support vector machine (Ho et al., 2014). In addition, this model also performed well in national and city-scale climate research (Li et al., 2017, Li and Zha, 2018). To date, many predictors (e.g. satellite-derived predictors, basic geographic information and meteorological data) have been considered in estimating air temperature using regression models (Ho et al., 2014, Ho et al., 2016, Li and Zha, 2018, Li and Zha, 2019, Yao et al., 2020, Otgonbayar et al., 2019, Chen et al., 2015, Shen et al., 2020). However, there is currently no framework for selecting optimal predictors for air temperature estimation. Most of these predictors are selected based on the opinions of experts.
Previous researches usually selected predictors based on user's experience. Due to the various predictors affecting air temperature, it is not clear which combination of variables can produce the best model performance. If all available parameters are used, there is likely to be relevant and redundant information, which could reduce the accuracy of air temperature estimation. This phenomenon can be explained as the Hughes effect (Hughes, 1968). Genetic algorithm (GA), as a global probability search algorithm based on the natural selection and evolution mechanism, has been extensively employed for feature selection and optimization of algorithms (Yang and Honavar, 1998, Chen et al., 2013). Previous study showed that GA performed well to select optimal conditioning factors in binary classification problems such as shallow translational landslide susceptibility mapping (Kavzoglu et al., 2015). However, to our knowledge, the selection of optimal predictors in the field of air temperature estimation is still an urgent problem to be solved. In addition, previous studies directly used weather stations located in urban and surrounding rural areas to estimate canopy urban heat island intensity (CLHII) (Ren et al., 2007a, Ren et al., 2007b, Chapman et al., 2017), which may be greatly affected by the representativeness of weather stations. An assessment on the difference in site-based and simulated-based CLHII is help to understand the uncertainty of site-based CLHII quantification, yet it is still lacking.
Taking the eastern and central China as a study case, the primary objective of this study is to seek the best combination of predictors by applying the feature selection of GA, and reanalyze the quantification of CLHII. RF was implemented to generate the spatial distribution of air temperature from the subsets (i.e. factor combination) selected by the GA. We also analyze the relationship between model performances and number of predictors. Model performances were evaluated based on root mean square error (RMSE). Finally, we compare the quantification of CLHII based on the analysis of observed and estimated air temperature.
Section snippets
Data and study area
Temperature and relative humidity were collected from 1056 weather stations at 1.5 m above the ground in the eastern and central China at 14:00 on October 9, 2016 (http://data.cma.cn/site/index.html). Study areas (Fig. 1) cover 15 provinces/cities including Beijing, Shanxi, Tianjin, Hebei, Shandong, Henan, Anhui, Hubei, Shanghai, Jiangsu, Zhejiang, Hunan, Jiangxi, Fujian, Guangdong. Basic geographic information (point-, or line-feature) was prepared, including river system (third-, fourth-,
Regression variables
Due to the advantages of easy acquisition, continuous observation and wide coverage, satellite data has become an important data source for air temperature estimation (Benali et al., 2012, Chen et al., 2015). For example, satellite spectral band, land cover, vegetation index and nighttime lights were used in temperature estimates, given that they have correlations with air temperature (Yoo et al., 2018, Li and Zha, 2018). On the other side, the latitude is the basic factor that determines the
Model performance
The RMSE calculated by the comparison between the observed air temperature and the estimated air temperature is shown in Table 2. Results show that except for models based on one predictor and two predictors, the RMSE of other models are <1.3 °C. The errors could be caused by many potential factors, such as wind speed, precipitation, anthropogenic heat, land surface temperature and clouds that are not considered in models. In contrast, the RMSE reported in previous studies is roughly between
Conclusion
Determining the most contributing predictors is of considerable importance in regression models for mapping the spatial distribution of air temperature when many predictors related to air temperature are available. For GA-selected predictors, the lowest estimation error (1.21 °C) was calculated with a good fitting precision (R2 = 0.9662) for GA-5 model (including latitude, relative humidity, DEM, DtPreC and DtTS). A complete set of the regression variables of air temperature is an essential
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This research was funded by Strategic pilot science and technology projects, the Chinese Academy of Sciences, Type A, grant number XDA20100314.
References (40)
- et al.
Estimating air surface temperature in Portugal using MODIS LST data
Remote Sens. Environ.
(2012) - et al.
Mapping maximum urban air temperature on hot summer days
Remote Sens. Environ.
(2014) - et al.
Comparison of urban heat islands mapped using skin temperature, air temperature, and apparent temperature (humidex), for the Greater Vancouver area
Sci. Total Environ.
(2016) - et al.
Selecting optimal conditioning factors in shallow translational landslide susceptibility mapping using genetic algorithm
Eng. Geol.
(2015) - et al.
Quantifying the spatiotemporal trends of canopy layer heat island (CLHI) and its driving factors over Wuhan, China with satellite remote sensing
Remote Sens.
(2017) - et al.
Mapping relative humidity, average and extreme temperature in hot summer over China
Sci. Total Environ.
(2018) - et al.
Satellite-based spatiotemporal trends of canopy urban heat islands and associated drivers in China’s 32 major cities
Remote Sens.
(2019) - et al.
Developing a temporally accurate air temperature dataset for Mainland China
Sci. Total Environ.
(2020) - et al.
Estimation of daily maximum and minimum air temperatures in urban landscapes using MODIS time series satellite data
ISPRS J. Photogramm. Remote Sens.
(2018) - et al.
A genetic algorithm for the set covering problem
J. Oper. Res. Soc.
(1996)
Mesoscale climatic simulation of surface air temperature cooling by highly reflective greenhouses in SE Spain
Environ. Sci. Technol.
Instrumental temperature series in eastern and central China back to the nineteenth century
J. Geophys. Res. Atmos.
Can the crowdsourcing data paradigm take atmospheric science to a new level? a case study of the urban heat island of london quantified using netatmo weather stations
Int. J. Climatol.
A statistical method based on remote sensing for the estimation of air temperature in China
Int. J. Climatol.
The application of the genetic adaptive neural network in landslide disaster assessment
J. Mar. Sci. Technol.
Global change and the ecology of cities
Science
Climate warming and disease risks for terrestrial and marine biota
Science
On themean accuracy of statistical pattern recognizers
IEEE Trans. Inf. Theory
Genetic algorithms for the travelling salesman problem: a review of representations and operators
Artif. Intell. Rev.
Using prophet forecasting model to characterize the temporal variations of historical and future surface urban heat island in China
J. Geophys. Res. Atoms.
Cited by (3)
A systematic review of studies involving canopy layer urban heat island: Monitoring and associated factors
2024, Ecological IndicatorsA method for improving the estimation of extreme air temperature by satellite
2022, Science of the Total EnvironmentCitation Excerpt :Estimated Ta data have been successfully used to analyze climate change (Li and Zha, 2019b), the urban heat island effect (Yao et al., 2021a), crop growth (Zhang et al., 2013), and disease transmission (Weiss et al., 2014). In many previous studies, we found an interesting phenomenon: the slopes of the fitting lines were generally lower than 1 when the observed and estimated Ta were used as the x-axis and y-axis, respectively (Fig. 1) (Chen et al., 2016; Didari et al., 2016; Didari and Zand-Parsa, 2018; Janatian et al., 2017; Jang et al., 2014; Li and Zha, 2019a; Li and Zha, 2019b; Lu et al., 2018; Peón et al., 2014; Yang et al., 2017; Yoo et al., 2018; Zhang et al., 2021; Zou et al., 2021). The slope of the fitting line lower than 1 suggests that: (1) the extremely low Ta is generally overestimated and that (2) the extremely high Ta is normally underestimated (Fig. 1).
Effects of urbanization on heat waves based on the wet-bulb temperature in the Yangtze River Delta urban agglomeration, China
2022, Urban ClimateCitation Excerpt :Nevertheless, how urbanization contributes to HW trends based on the TW in the YRD urban agglomeration remains ambiguous. In recent decades, the influence on UHIs have investigated in many previous studies, including the effects of albedo (Franzitta et al., 2013), cloud cover (He, 2018), soil evaporation (Li and Bou-Zeid, 2013; Li et al., 2019), land surface temperature (Zou et al., 2021; Ejiagha et al., 2020) and wind speed (Liu et al., 2020; Pyrgou et al., 2020). Li and Bou-Zeid (2013) found that the additional impact of heat waves on urban areas was attributed to the lack of surface water in urban areas and the low wind speeds associated with heat waves in the Baltimore-Washington metropolitan area.