A machine learning model to estimate ambient PM2.5 concentrations in industrialized highveld region of South Africa

https://doi.org/10.1016/j.rse.2021.112713Get rights and content

Highlights

  • We developed a random forest model to estimate daily PM2.5 concentrations at 1 km2 resolution in South Africa.

  • Our model captured seasonal trends and spatial patterns of PM2.5 with relatively high accuracy.

  • High PM2.5 levels were identified in low-income settlements and industrial areas in western Mpumalanga.

  • PM2.5 levels decreased in north of Gauteng province after the implementation of new air quality standard.

Abstract

Exposure to fine particulate matter (PM2.5) has been linked to a substantial disease burden globally, yet little has been done to estimate the population health risks of PM2.5 in South Africa due to the lack of high-resolution PM2.5 exposure estimates. We developed a random forest model to estimate daily PM2.5 concentrations at 1 km2 resolution in and around industrialized Gauteng Province, South Africa, by combining satellite aerosol optical depth (AOD), meteorology, land use, and socioeconomic data. We then compared PM2.5 concentrations in the study domain before and after the implementation of the new national air quality standards. We aimed to test whether machine learning models are suitable for regions with sparse ground observations such as South Africa and which predictors played important roles in PM2.5 modeling. The cross-validation R2 and Root Mean Square Error of our model was 0.80 and 9.40 μg/m3, respectively. Satellite AOD, seasonal indicator, total precipitation, and population were among the most important predictors. Model-estimated PM2.5 levels successfully captured the temporal pattern recorded by ground observations. Spatially, the highest annual PM2.5 concentration appeared in central and northern Gauteng, including northern Johannesburg and the city of Tshwane. Since the 2016 changes in national PM2.5 standards, PM2.5 concentrations have decreased in most of our study region, although levels in Johannesburg and its surrounding areas have remained relatively constant. This is anadvanced PM2.5 model for South Africa with high prediction accuracy at the daily level and at a relatively high spatial resolution. Our study provided a reference for predictor selection, and our results can be used for a variety of purposes, including epidemiological research, burden of disease assessments, and policy evaluation.

Introduction

Fine particulate matter (PM2.5, airborne particles with an aerodynamic diameter of less than 2.5 μm) is a ubiquitous air pollutant that harms human health and wellbeing. Numerous epidemiological studies of both short- and long-term exposure have reported strong associations with adverse health outcomes, including premature mortality and morbidity from a range of diseases (Atkinson et al., 2014; Burnett et al., 2018; Liu et al., 2019). Populations living in developing regions, in particular, are often exposed to levels of particulate matter greatly exceeding WHO standards, and an estimated 4.2 million premature death worldwide are attributable to ambient PM2.5 (Cohen et al., 2017; World Health Organization, 2016).

In addition to natural sources such as biomass burning, dust storms, and ocean spray, PM2.5 and its precursors are emitted from several anthropogenic sources, including industrial activities, power generation, vehicle traffic, agriculture burning, and household fuel use (Tucker, 2000). The diverse emission sources and secondary production that occurs in the atmosphere results in complex distributions of PM2.5 in space and time (Seinfeld and Pandis, 2016). Current air quality monitoring networks are often insufficient to quantify PM2.5 exposure and health risk at the local level even in high-income countries. Routine monitoring is even more sparse or nonexistent in many low- and middle-income countries (Brauer et al., 2016), however satellite data can be used to fill this gap.

The past decade has seen the increasing application of satellite remote sensing products such as aerosol optical depth (AOD) to estimate surface PM2.5 concentrations. AOD measures the light extinction of aerosol particles at a given wavelength as it passes through the atmospheric column. Although AOD is often strongly correlated with surface PM2.5 concentration, this relationship is nonlinear and modified by various factors such as meteorology, particle vertical distribution, and particle chemical composition (Hoff and Christopher, 2009; Liu et al., 2005; Sorek-Hamer et al., 2020). Over the past two decades, a number of statistical models have been proposed to capture these relationships at different spatial and temporal scales in order to improve prediction accuracy and robustness, including linear mixed-effects models (Ma et al., 2016b), geographically weighted regression (GWR) (Ma et al., 2014), generalized additive models (Strawa et al., 2013), Bayesian downscaler (Chang et al., 2013), and multi-stage models (Kloog et al., 2014). Most recently, machine learning models such as random forests (Brokamp et al., 2018; Hu et al., 2017) and neural network (Li et al., 2020) have shown high prediction accuracy. These advanced satellite-driven models are useful tools to fill the data gaps left by sparse ground monitors networks and enable more comprehensive assessments of PM2.5 exposure and its associated health effects. Random forest model is a good choice for areas where advanced models have never been built to explore the spatiotemporal patterns of PM2.5 because it not only has high prediction accuracy but also provides guidance for predictor selection which is very helpful for future research.

As part of the 2004 National Environmental Management: Air Quality Act (Act No. 39 of 2004), approximately 130 ground ambient air quality stations have been established or, for those already in existence, incorporated into the national reporting of air quality levels on the South African Air Quality Information System (SAAQIS, http://saaqis.environment.gov.za/). These stations monitor criteria pollutants and precursors including PM10, PM2.5, carbon monoxide (CO), nitrogen oxides (NOx, NO, and NO2), ozone (O3), and sulfur dioxide (SO2), as well as meteorological factors (Gwaze and Mashele, 2018; South African Air Quality Information System, 2010). Most of these ground stations are in areas with poor air quality, such as low-income settlements that use solid fuel for cooking, heating or lighting (i.e., domestic burning activities), urban areas, areas near large roads, and industrial areas. These stations are situated within communities to assist with the assessment of population exposure to air pollution. The ambient monitoring stations in South Africa are limited in spatial coverage, data availability, and data quality. Only 20 ground stations had available PM2.5 data in our study domain. Hourly raw measurements were only available for 47% of the modeling days in our study period. After quality control, this percentage decreased to 40%.

The South African government promulgated National Ambient Air Quality Standards (NAAQS) for many criteria pollutants in 2009 (Department of Environmental Affairs, 2009), and in 2012 (Department of Environmental Affairs, 2012a), promulgated PM2.5 standards. The PM10 and PM2.5 standards were designed to become more stringent over time. Table S1 displays the PM2.5 NAAQS with compliance date; ultimately (i.e., starting 1 January 2030), the 24-h standard of 25 μg/m3 will be equivalent to the WHO Air Quality Guideline, and the annual standard of 15 μg/m3 would be equivalent to the WHO Interim Target 3 (World Health Organization, 2006).

Ambient air quality in South Africa is impacted by a range of sources including natural sources such as biomass burning, dust, lightning, and biogenic sources, as well as anthropogenic sources such as industry, vehicles, domestic burning, and waste burning (Wright et al., 2017). In South Africa, coal is the dominant energy resource, providing 69% of primary energy needs and more than 90% of electricity (The World Bank, 2015; Zulu et al., 2019). Emissions from other human activities, including industry, mining, mobile vehicle, domestic burning, waste burning, also contribute to PM2.5 pollution, resulting in a significant public health burden (Katoto et al., 2019; Pacella et al., 2007; Wright et al., 2017).

Using monitoring data and interpolation with the Benefits Mapping and Analysis Program (BenMAP) model, Altieri and Keen (2019) estimated that if the WHO Guidelines for annual average PM2.5 concentrations were met across South Africa, that 28,000 premature deaths (95th percentile CI 15000–52,000) in South Africa could be avoided, with economic costs of over $29 billion (~4.5% of South African's GDP). A key source of uncertainty in this estimate is from the limited coverage of PM measurements across South Africa.

The South African government declared three national air quality priority areas where ambient air quality does (or is projected to) exceed NAAQS to focus concerted and specific air quality measures to address the poor air quality (Department of Environmental Affairs, 2005). These Priority Areas are the Vaal Triangle Airshed Priority Area (VTAPA) (Department of Environmental Affairs and Tourism, 2006), the Highveld Priority Area (HPA) (Department of Environmental Affairs and Tourism, 2007), and the Waterberg-Bojanala Priority Area (WBPA) (Department of Environmental Affairs, 2012b) (Fig. 1). The Priority Areas contain most of South Africa's large industrial hubs and coal-fired power plants. The Priority Areas also contain portions of the Gauteng Province, where the large mega-city conurbation of Johannesburg-Tshwane-Ekurhuleni is located, the domain of this study.

Ambient air quality levels in the study domain often exceed South African National Ambient Air Quality Standards, with exceedances driven by high PM2.5, PM10, and ozone concentrations (Gregor et al., 2019; Kogieluxmie and Venkataraman, 2019; uMoya-NILU, 2017; Venter et al., 2012). A decreasing trend in PM2.5 concentrations has been found at some monitoring sites in the HPA and the VTAPA, though at the current rate, compliance with current standards will take years at some sites and decades at others. PM concentrations generally peak in the dry winter season, driven by increases in both emissions (e.g., domestic burning, wind-blown dust, biomass burning) and stagnant meteorological conditions (Hersey et al., 2015; Tyson et al., 1988; Xulu et al., 2020). In general, PM concentrations are higher in low-income settlements compared to monitoring sites in industrial and middle-class residential areas (Hersey et al., 2015). Large-scale regional biomass burning impacts this region in the late winter and spring, leading to peak Aerosol Optical Depth (AOD) levels (Archibald et al., 2010; Duncan et al., 2003; Hersey et al., 2015; Queface et al., 2011; Tesfaye et al., 2011).

The ability to accurately estimate exposure to ground-level PM2.5 is essential to study its adverse health effects and assess the effectiveness of air pollution control measures. In this study, we built a 1 km2 resolution daily PM2.5 concentration model in Gauteng Province and the surrounding areas from 2014 to 2018, based on satellite AOD, meteorological fields, land use data, and socioeconomic variables. This is an advanced model rarely developed in South Africa or elsewhere in Africa. Our goal is to explore the feasibility of developing machine learning models in this region with sparse and incomplete ground measurements and gain insight on important predictors of PM2.5 levels. We also assessed the change in regional PM2.5 levels before and after the implementation of new national air quality standards.

Section snippets

Study area

Our study area is approximately 200 × 230 km2 in the northeast of South Africa on the Highveld Plateau (average altitude ~1700 m), covering all of Gauteng Province, western Mpumalanga Province, and the eastern part of the North-West Province, as shown in Fig. 1. Gauteng Province, with a population of approximately 15 million people, is the country's economic engine, contributed more than a third of South Africa's GDP in 2017 (http://www.statssa.gov.za/?p=11092). Gauteng Province contains the

Ground measurements and gap-filled AOD

Ground measurements were available from January 1st, 2014 through December 31st, 2018. A total of 14,927 daily PM2.5 values were calculated from the hourly measurements, ranging from 72 to 1386 per monitoring site. The multi-year mean and standard deviation of daily PM2.5 for all stations was 27.23 μg/m3 and 20.95 μg/m3, respectively, with a range of daily values of 0.91–263.88 μg/m3.

There was substantial inter-monitoring site variability in the multi-year average PM2.5 levels, with the lowest

Discussion

It is well known that fine particles are associated with a large burden of disease in South Africa (Altieri and Keen, 2019; Bauer et al., 2019; Global Burden of Disease, 2016; Lim et al., 2012; Wichmann and Voyi, 2012). Nevertheless, the monitoring network in South Africa is uneven, and mainly located in densely populated urban centers or industrial areas. The monitors have variable data quality and data capture rates, hindering PM2.5 exposure assessment for large segments of the South African

Conclusion

Our model is anadvanced, high-resolution model to estimate daily PM2.5 concentration in South Africa, with a domain of 200 × 230 km2 that covers the population-dense Gauteng Province and surrounding area. The model was able to reproduce the marked seasonal pattern characteristic of northeastern South Africa and has high prediction accuracy at the daily level with an overall cross-validation R2 of 0.8. Since the change in the national PM2.5 standard in 2016, we observed a reduction of PM2.5 in

Credit author statement

D. Zhang and L. Du contributed equally to this work. This research was conceptualized by Yang Liu. D. Zhang and L. Du analyzed data and drafted manuscript. W. Wang, Q. Zhu and J, Bi provided technical support for data processing. All authors participated in manuscript preparation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was partially supported by the MAIA science team at the JPL, California Institute of Technology, led by D. Diner (Subcontract #1588347). NS was supported by the NIEHS-funded HERCULES Center (P30ES019776). RMG and MN were supported by a CSIR Parliamentary Grant. We thank the PI of the Pretoria_CSIR-DPSS site from AERONET for establishing and maintaining the site. The authors thank the Department of Environment, Forestry and Fisheries, the South African Weather Service, and the air

References (73)

  • M. Sorek-Hamer et al.

    Review: strategies for using satellite-based products in modeling PM2.5 and short-term pollution episodes

    Environ. Int.

    (2020)
  • W.G. Tucker

    An overview of PM2. 5 sources and control strategies

    Fuel Process. Technol.

    (2000)
  • S. Archibald et al.

    Southern African fire regimes as revealed by remote sensing

    Int. J. Wildland Fire

    (2010)
  • R.W. Atkinson et al.

    Epidemiological time series studies of PM2.5 and daily mortality and hospital admissions: a systematic review and meta-analysis

    Thorax

    (2014)
  • S. Bauer et al.

    Desert dust, industrialization, and agricultural fires: health impacts of outdoor air pollution in Africa

    J. Geophys. Res.-Atmos.

    (2019)
  • M. Brauer et al.

    Ambient air pollution exposure estimation for the global burden of disease 2013

    Environ. Sci. Technol.

    (2016)
  • L. Breiman

    Random forests

    Mach. Learn.

    (2001)
  • C. Brokamp et al.

    Predicting daily urban fine particulate matter concentrations using a random Forest model

    Environ. Sci. Technol.

    (2018)
  • R. Burnett et al.

    Global estimates of mortality associated with long-term exposure to outdoor fine particulate matter

    Proc. Natl. Acad. Sci. U. S. A.

    (2018)
  • Center for International Earth Science Information Network

    Gridded Population of the World, Version 4 (GPWv4): Administrative Unit Center Points with Population Estimates

    (2016)
  • H.H. Chang et al.

    Calibrating MODIS aerosol optical depth for predicting daily PM2.5 concentrations via statistical downscaling

    J. Expo. Sci. Environ. Epidemiol.

    (2013)
  • A.J. Cohen et al.

    Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: an analysis of data from the Global Burden of Diseases Study 2015

    Lancet

    (2017)
  • A. David et al.

    Spatial Poverty and Inequality in South Africa: A Municipality Level Analysis

    (2018)
  • Department of Environment Forestry and Fisheries

    Draft Second Generation Air Quality Management Plan for Vaal Triangle Airshed Priority Area

    (2020)
  • Department of Environmental Affairs

    National Envrionment Management: Air Quality Act [No. 39 of 2004]

  • Department of Environmental Affairs

    National Environmental Management: Air Quality Act of 2004 (Act No. 39 of 2004)

    (2009)
  • Department of Environmental Affairs

    National Environmental Management: Air Quality Act of 2004 (Act No. 39 of 2004)

    (2012)
  • Department of Environmental Affairs

    National Environmental Management: Air Quality Act, 2004 (Act no. 39 of 2004) Declaration of the Waterberg National Priority Area

  • Department of Environmental Affairs, & Department of Rural Development and Land Reform

    2018 South African National Land-Cover Change Assessments, DEA E1434 Land-Cover.

    (2019)
  • Department of Environmental Affairs and Tourism

    Declaration of the Vaal Triangle Air-Shed Priority Area in Terms of Section 18(1) of the National Environmental Management: Air Quality Act 2004, (Act no. 39 of 2004)

  • Department of Environmental Affairs and Tourism

    Declaration of the Highveld Aspriority Area in Terms of Section 18(1) of the National Environmental Management: Air Quality Act, 2004 (Act no. 39 OF 2004)

  • B.N. Duncan et al.

    Interannual and seasonal variability of biomass burning emissions constrained by satellite observations

    J. Geophys. Res.-Atmos.

    (2003)
  • E. Frame et al.

    Measuring Multidimensional Poverty among Youth in South Africa at the Sub-National Level

    (2016)
  • R.M. Garland et al.

    Air quality indicators from the Environmental Performance Index: potential use and limitations in South Africa

    Clean Air Journal

    (2017)
  • Global Burden of Disease

    Global burden of air pollution. Institute for Health Metrics and Evaluation (IHME)

    (2016)
  • F. Gregor et al.

    Assessment of changes in concentrations of selected criteria pollutants in the Vaal and Highveld Priority Areas

    Clean Air Journal

    (2019)
  • Cited by (16)

    • Ensemble averaging using remote sensing data to model spatiotemporal PM<inf>10</inf> concentrations in sparsely monitored South Africa

      2022, Environmental Pollution
      Citation Excerpt :

      Furthermore, we previously reported higher levels of PM10 concentrations in Gauteng province monitoring stations compared to the other three provinces (Arowosegbe et al., 2021a). A similar pattern was also reported for PM2.5 by Zhang and colleagues (Zhang et al., 2021) showing high modelled PM2.5 concentrations in Northern and Southern Gauteng of the Highveld region of South Africa. The models identified the PM10 pollution hotspots around the mining activities of Mpumalanga province, Southern Durban Basin industrial Basin of KwaZulu-Natal and Cape Town Metropolitan of Western Cape province.

    View all citing articles on Scopus
    View full text