Estimating PM2.5 concentration using the machine learning GA-SVM method to improve the land use regression model in Shaanxi, China

https://doi.org/10.1016/j.ecoenv.2021.112772Get rights and content
Under a Creative Commons license
open access

Highlights

  • The combined genetic algorithm-support vector machine is a novel and robust method in estimating PM2.5 concentrations.

  • GA-SVM approach outperforms conventional land use regression model and SVM models and previous related studies.

  • Seasonal variation and spatial autocorrelation of PM2.5 concentrations in Shaanxi is evaluated.

Abstract

With rapid economic growth, urbanization and industrialization, fine particulate matter with aerodynamic diameters ≤ 2.5 µm (PM2.5) has become a major pollutant and shows adverse effects on both human health and the atmospheric environment. Many studies on estimating PM2.5 concentrations have been performed using statistical regression models and satellite remote sensing. However, the accuracy of PM2.5 concentration estimates is limited by traditional regression models; machine learning methods have high predictive power, but fewer studies have been performed on the complementary advantages of different approaches. This study estimates PM2.5 concentrations from satellite remote sensing-derived aerosol optical depth (AOD) products, meteorological data, terrain data and other predictors in 2015 in Shaanxi, China, using a combined genetic algorithm-support vector machine (GA-SVM) method, after which the spatial clustering pattern was explored at the season and year levels. The results indicated that temperature (r = −0.684), precipitation (r = −0.602) and normalized difference vegetation index (NDVI) (r = −0.523) were significantly negatively correlated with the PM2.5 concentration, while AOD (r = 0.337) was significantly positively correlated with the PM2.5 concentration. Compared to conventional land use regression (LUR) and SVM models and previous related studies, the GA-SVM method demonstrated a significantly better prediction accuracy of PM2.5 concentration, with a higher 10-fold cross-validation coefficient of determination (R2) of 0.84 and lower root mean square error (RMSE) and mean absolute error (MAE) of 12.1 μg/m3 and 10.07 μg/m3, respectively. Y-scrambling test shows that the models have no chance correlation. The central and southern parts of Shaanxi have high PM2.5 concentrations, which are mainly due to the pollutant emissions and meteorological and topographical conditions in those areas. There was a positive spatial agglomeration characteristic of regional PM2.5 pollution, and the spatial spillover effect of PM2.5 pollution for seasonal and annual variations does exist. In general, the GA-SVM method is robust and accurately estimates PM2.5 concentrations via a novel modeling framework application and high-quality spatiotemporal information. It also has great significance for the exploration of PM2.5 pollution estimation and high-precision mapping methods, especially early warning in high-risk areas. Finally, the prevention and control of atmospheric pollution should take pollution control measures from major cities and surrounding cities, and focus on the joint pollution control measures for plain cities.

Keywords

PM2.5
Machine learning
GA-SVM
Land use regression
Method improvement
Spatial clustering

Cited by (0)