Elsevier

Atmospheric Research

Volume 248, 15 January 2021, 105146
Atmospheric Research

Estimation of hourly full-coverage PM2.5 concentrations at 1-km resolution in China using a two-stage random forest model

https://doi.org/10.1016/j.atmosres.2020.105146Get rights and content

Highlights

  • A two-stage random forest model was developed to estimate hourly full-coverage PM2.5 at 1 km resolution in China.

  • Gap-filled AOD was generated using AHI, MAIAC and CAMS AOD.

  • Estimation of PM2.5 concentrations achieved relatively high accuracies.

  • CAMS PM2.5 simulations, elevation and gap-filled AOD greatly contributed to the model performance.

Abstract

Fine particulate matter such as PM2.5 has been the focus of increasing public concerns because of its adverse effect on environment and health risks. However, existing efforts of mapping PM2.5 concentrations are always limited by coarse spatial resolutions and temporal frequencies. Addressing this shortcoming, here we explicitly estimated hourly PM2.5 concentrations at 1-km spatial resolution in China from March 2018 to February 2019 using a two-stage random forest model. In the first stage, we used a gap-filling method to generate full-coverage Aerosol Optical Depth (AOD) by fusing AOD data from satellite (Himawari-8 and MODIS) and weather forecast model (CAMS), and additional meteorological and geographical variables. Gap-filled AOD was subsequently used to estimate hourly PM2.5 in the Stage II. Results showed that our model achieved accurate and robust estimations of PM2.5 concentrations, with an overall cross-validated R2 of 0.85, root mean squared error of 11.02 μg/m3, and mean absolute error of 6.73 μg/m3. CAMS-simulated PM2.5, elevation, and gap-filled AOD were identified to be important variables contributing to the model performance of PM2.5 estimation. The model performance varied over the daily temporal scale. Specifically, daily estimation model performed better in spring and winter but worse in summer and autumn. We provide an alternative to generate spatially and temporally explicit mapping of PM2.5 concentrations with fine resolutions, making it possible to achieve real-time monitoring of air pollutions. The detailed spatial heterogeneity and diurnal variability of PM2.5 concentrations will also be valuable for environmental exposure assessments.

Introduction

Fine particulate matter such as PM2.5 (particles with an aerodynamic diameter less than 2.5 μm), is one of critical pollutants in atmospheric environment. Numerous epidemiological studies have demonstrated that short- and long-term exposure to PM2.5 has adverse effects on public health, particularly associated with cardiovascular and respiratory diseases mortality and hospital admissions (Dominici et al., 2006; Kim et al., 2015; Pope et al., 2004). Global Burden of Disease study in 2017 claimed that ambient particulate matter was the first environmental risk factor for mortality (Stanaway et al., 2018). A number of serious air pollution events related to high PM2.5 concentrations have occurred in China over the past decade, resulting in a majority of population under the risk of exposing to high PM2.5 concentrations (Chan and Yao, 2008; Qiu, 2014; Song et al., 2017). Spatiotemporal dynamics of PM2.5 concentrations at fine resolutions is urgently required for estimating actual long-term and short-term exposure to PM2.5 (Chen et al., 2018b). Although PM2.5 monitoring network in China has been operated since 2013, those ground-based stations are still limited by sparse and uneven distribution (Chen et al., 2018a), which leads to a substantial knowledge gap between the actual and estimated PM2.5 exposure, especially for rural areas.

Fortunately, remote sensing empowers the ability to quickly provide large-scale and timely information about the spatial variability of atmospheric environment and terrestrial surface (Chen et al., 2017). Aerosol optical Depth (AOD) is one of widely-used parameters derived from satellite-based observations to estimate PM2.5 concentrations, which is defined as the integration of aerosol extinction in the total atmospheric column. Due to the high correlation between PM2.5 and AOD (Wang and Christopher, 2003), there are numerous studies working on satellite-based estimation of PM2.5 concentrations (Brokamp et al., 2018; Liang et al., 2018; Ma et al., 2016; van Donkelaar et al., 2016; Xiao et al., 2017). AOD represents the extinction capability of the total atmospheric column content over certain area, but it does not directly measure the magnitude of fine particulate matter concentration. Therefore, AOD measurements are always integrated with ancillary variables such as meteorological and geographical variables to better estimate PM2.5 concentrations (Chen et al., 2018a; Chu et al., 2016; Guo et al., 2017). For example, a number of studies have used statistical models to construct the relationship between PM2.5 and AOD, including generalized additive model (GAM) (Ma et al., 2016a), linear mixed effect model (LME) (Zhang et al., 2019b; Zheng et al., 2016) and geographically (and temporally) weighted regression (He and Huang, 2018; Hu et al., 2013; van Donkelaar et al., 2016). More recently, machine learning models have been increasingly employed in this filed due to their flexibility of solving complex nonlinearity problems (Nabavi et al., 2019; Xiao et al., 2018a; Xu et al., 2018). Noticeably, random forest (RF) approach has demonstrated its great performance for estimating PM2.5 concentrations (Hu et al., 2017; Zhang et al., 2018b; Stafoggia et al., 2019).

Though many advances have been made in PM2.5 estimations using satellite-based AOD products, there still exist two major limitations. Firstly, it is challenging for satellite-estimated PM2.5 to meet the requirement of simultaneous fine spatial and temporal resolutions. Existing efforts of satellite-estimated PM2.5 concentrations have separately achieved with a spatial resolution from 1 km (Kloog et al., 2015; Liang et al., 2018; Zhang et al., 2018a) to even 500 m (Bai et al., 2016), and a temporal resolution of from monthly (Huang et al., 2018), daily (Brokamp et al., 2018; He and Huang, 2018), to hourly (Wang et al., 2017; Zhang et al., 2019b; Zhang et al., 2016b). However, it is unlikely to acquire PM2.5 concentrations with simultaneous fine spatiotemporal resolutions using single satellite AOD product. For example, AOD products from polar orbit satellites such as Moderate Resolution Imaging Spectroradiometer (MODIS) (Levy et al., 2013), Multi-angle Imaging SpectroRadiometer (MISR) (Kahn et al., 2009), Visible Infrared Imaging Radiometer Suite (VIIRS) (Jackson et al., 2013) are with fine spatial resolutions, but they have no capacity of estimating hourly PM2.5 concentrations. In contrast, AOD products from geostationary earth orbit satellites such as Advanced Himawari Imager (AHI) (Kikuchi et al., 2018), Geostationary Operational Environmental Satellite (GOES) (Zhang et al., 2013) are with very high temporal frequencies, but their spatial resolutions are pretty coarse.

Secondly, spatially and temporally continuous satellite-based PM2.5 estimations are always hindered by the availability of valid AOD datasets, because there will be no or limited AOD data over surface areas covered by cloud and snow, high surface reflectance, or high aerosol loading (Engel-Cox et al., 2004). As a result, satellite-estimated daily average PM2.5 concentrations with gaps tended to be underestimated when compared with station-based measurements, which would lead to considerable biases in subsequent exposure assessments (Lv et al., 2016; Xiao et al., 2017). Addressing this issue, a number of gap-filling strategies have been proposed, for example, simple interpolation (Ma et al., 2014), smoothing using the spatiotemporal autocorrelation of AOD (Yang and Hu, 2018), sampling bias correction factor method (van Donkelaar et al., 2016), and multiple imputation (Xiao et al., 2017). However, these methods were highly dependent on the number and spatial distribution of retrieved AOD and PM2.5 measurements. Although AOD products from weather forecast models has arouse certain interest in this field due to their spatially and temporally complete and continuous coverage, their spatial resolutions are very coarse. Owing to this shortcoming, a few studies have attempted to combine satellite-derived AOD and model-simulated AOD to fill gaps, for example, Stafoggia et al. (2019) imputed missing MAIAC AOD from CAMS in Italy, and Xiao et al. (2017) combined MAIAC with chemical transport model (CTM) simulations in Yangtze River Delta, China.

The fusion of multi-source AOD products seems promising to solve the aforementioned limitations simultaneously. In this study, we proposed a two-stage random forest model to generate hourly PM2.5 estimations at 1-km spatial resolution using satellite-derived and model-simulated AOD. By taking advantage of high spatial resolution from MAIAC AOD, high temporal resolution from AHI AOD, and complete spatial coverage from CAMS AOD, we first obtained a full-coverage hourly AOD product at 1-km resolution. Subsequently, random forest model was developed using the derived gap-filled AOD product and ancillary datasets including meteorological and geographical variables to finally estimate hourly full-coverage PM2.5 concentrations in China from March 2018 to February 2019 (i.e., entire year for experimental tests).

Section snippets

Data and methods

The study area covers entire China with the extent of 18.16°N ~ 53.50°N and 73.45°E ~ 135.08°E, as shown in Fig. 1. This entire area is divided into gridded cells of 0.01° × 0.01° (approximately 1 × 1 km2) using ArcMap 10.2 and thus produces 9,618,318 cells in total. Table 1 summarizes the basic information of AOD, PM2.5, meteorological and geographical variables used in this study.

Validation of AOD against AERONET AOD

We evaluated the accuracy of AOD inputs and outputs in Stage I against AERONET AOD. Fig. 2 shows evaluation results of AOD values from AHI, MAIAC and CAMS against AERONET. AHI AOD achieved a great performance with R of 0.70 and RMSE of 0.20 as shown in Fig. 2a. 53.31%, 26.99%, 19.70% of AHI matchups were within, above and below EE, respectively. MAIAC AOD (Fig. 2b) also achieved a similarly great result that R was 0.68 and RMSE was 0.14 against AERONET. There were 55.02%, 30.48% and 14.50% of

Comparison with previous studies

Our estimation model successfully provided hourly full-coverage PM2.5 estimations at 1-km spatial resolution with overall cross-validated R2 of 0.85, RMSE of 11.02 μg/m3 and MAE of 6.73 μg/m3, which was comparable to previous studies (Table 3). Previous studies showed a similarly good ability to estimate PM2.5 with cross-validation R2 ranging from 0.61– 0.87, RMSE ranging from 15.57– 28.68 μg/m3. The latest released MAIAC AOD product has not been widely conducted so far. When estimating hourly

Conclusions

Hourly full-coverage PM2.5 estimations at 1-km spatial resolution across China were generated in this study using a two-stage random forest models. It was integrated with satellite derived and weather model simulated data, meteorological and geographical information. AHI and MAIAC AOD achieved good agreements with ground-based measurements. However, satellite-based AOD was usually biased under extremely high or low aerosol loadings. In Stage I, we created gap-filled AOD using AHI AOD with fine

Declaration of Competing Interest

None.

Acknowledgments

This research was partially supported by the National Research Program of the Ministry of Science and Technology of the People's Republic of China (2016YFA0600101, 2016YFA0600104), and donations from Delos Living LLC and the Cyrus Tang Foundation to Tsinghua University.

We thank Japan Aerospace Exploration Agency (JAXA) and NASA for making the Himawari-8/AHI and MODIS aerosol products publicly available and AERONET site Principal Investigators for making the aerosol data publicly available.

References (81)

  • J. He et al.

    Air pollution characteristics and their relation to meteorological conditions during 2014–2015 in major Chinese cities

    Environ. Pollut.

    (2017)
  • X. Hu et al.

    Estimating ground-level PM2.5 concentrations in the southeastern U.S. using geographically weighted regression

    Environ. Res.

    (2013)
  • H. Hu et al.

    Satellite-based high-resolution mapping of ground-level PM2.5 concentrations over East China using a spatiotemporal regression kriging model

    Sci. Total Environ.

    (2019)
  • K. Huang et al.

    Predicting monthly high-resolution PM2.5 concentrations with random forest model in the North China Plain

    Environ. Pollut.

    (2018)
  • K.-H. Kim et al.

    A review on the human health impact of airborne particulate matter

    Environ. Int.

    (2015)
  • I. Kloog et al.

    Estimating daily PM2.5 and PM10 across the complex geo-climate region of Israel using MAIAC satellite-based AOD data

    Atmos. Environ.

    (2015)
  • R. Li et al.

    Using MAIAC AOD to verify the PM2.5 spatial patterns of a land use regression model

    Environ. Pollut.

    (2018)
  • F. Liang et al.

    MAIAC-based long-term spatiotemporal trends of PM2.5 in Beijing, China

    Sci. Total Environ.

    (2018)
  • J. Liu et al.

    Satellite-based PM2.5 estimation directly from reflectance at the top of the atmosphere using a machine learning algorithm

    Atmos. Environ.

    (2019)
  • Z. Ma et al.

    Satellite-derived high resolution PM2.5 concentrations in Yangtze River Delta Region of China using improved linear mixed effects model

    Atmos. Environ.

    (2016)
  • A. Mhawish et al.

    Comparison and evaluation of MODIS Multi-angle Implementation of Atmospheric Correction (MAIAC) aerosol product over South Asia

    Remote Sens. Environ.

    (2019)
  • S.O. Nabavi et al.

    Assessing PM2.5 concentrations in Tehran, Iran, from space using MAIAC, deep blue, and dark target AOD and machine learning algorithms. Atmospheric Pollut

    Res.

    (2019)
  • C. Song et al.

    Health burden attributable to ambient PM2.5 in China

    Environ. Pollut.

    (2017)
  • M. Stafoggia et al.

    Estimation of daily PM10 and PM2.5 concentrations in Italy, 2013-2015, using a spatiotemporal land-use random-forest model

    Environ. Int.

    (2019)
  • A.P.K. Tai et al.

    Correlations between fine particulate matter (PM2.5) and meteorological variables in the United States: Implications for the sensitivity of PM2.5 to climate change

    Atmos. Environ.

    (2010)
  • W. Wang et al.

    Two-stage model for estimating the spatiotemporal distribution of hourly PM1.0 concentrations over central and East China

    Sci. Total Environ.

    (2019)
  • Q. Xiao et al.

    Full-coverage high-resolution daily PM2.5 estimation using MAIAC AOD in the Yangtze River Delta of China

    Remote Sens. Environ.

    (2017)
  • L. Xiao et al.

    High-resolution spatiotemporal mapping of PM2.5 concentrations at mainland China using a combined BME-GWR technique

    Atmos. Environ.

    (2018)
  • Y. Xu et al.

    Evaluation of machine learning techniques with multiple remote sensing datasets in estimating monthly concentrations of ground-level PM2.5

    Environ. Pollut.

    (2018)
  • J. Yang et al.

    Filling the missing data gaps of daily MODIS AOD using spatiotemporal interpolation

    Sci. Total Environ.

    (2018)
  • L. Zang et al.

    Estimating hourly PM1 concentrations from Himawari-8 aerosol optical depth in China

    Environ. Pollut.

    (2018)
  • Y. Zhan et al.

    Spatiotemporal prediction of continuous daily PM2.5 concentrations across China using a spatially explicit machine learning algorithm

    Atmos. Environ.

    (2017)
  • X. Zhang et al.

    Predicting daily PM2.5 concentrations in Texas using high-resolution satellite aerosol optical depth

    Sci. Total Environ.

    (2018)
  • R. Zhang et al.

    A nonparametric approach to filling gaps in satellite-retrieved aerosol optical depth for estimating ambient PM2.5 levels

    Environ. Pollut.

    (2018)
  • Z. Zhang et al.

    Evaluation of MAIAC aerosol retrievals over China

    Atmos. Environ.

    (2019)
  • T. Zhang et al.

    Ground-level PM2.5 estimation over urban agglomerations in China with high spatiotemporal resolution based on Himawari-8

    Sci. Total Environ.

    (2019)
  • Y. Zheng et al.

    Estimating ground-level PM2.5 concentrations over three megalopolises in China using satellite-derived aerosol optical depth measurements

    Atmos. Environ.

    (2016)
  • Y. Bai et al.

    A geographically and temporally weighted regression model for ground-level PM2.5 estimation from satellite-derived 500 m resolution AOD

    Remote Sens.

    (2016)
  • J.H. Belle et al.

    The potential impact of satellite-retrieved cloud parameters on ground-level PM2.5 mass and composition

    Int. J. Environ. Res. Public Health

    (2017)
  • L. Breiman

    Random forests

    Mach. Learn.

    (2001)
  • Cited by (72)

    View all citing articles on Scopus
    View full text