Introduction

Atmospheric particulate matter (PM) pollution, particularly PM2.5 and PM10, poses a severe and growing threat to global public health (Orioli et al. 2018). Exposure to the high concentration of PM has a strong association with different health hazards such as respiratory diseases, cancer, and cardiovascular disease. (Kim et al. 2015). In a clinical meta-analysis, Kim et al. (2015) revealed that about 3% of cardiopulmonary and 5% of lung cancer deaths are attributable to PM exposure globally. The study also argued that the existence of PM in the atmosphere poses more threat to public health than that of other ambient air pollutants. Moreover, a new study revealed that an increase of 1 g m−3 in PM2.5 could accelerate the death rate of the coronavirus disease 2019 (COVID-19) by 15% (Wu et al. 2020). Thus, numerous scientific studies illustrated the strong evidence of the association between health hazards and PM concentration. It occurs, mostly, for the size and composition of the particles. Both particles are constituted by other subclasses of pollutants with the major ones being water-soluble ions, i.e., sulfates, nitrates, ammonium, and minor constituents such as metal ions, organic and elemental carbon, and volatile organics. They can be emitted into the air from natural or anthropogenic sources, and secondary formation in the atmosphere (Lu et al. 2016).

Since air quality is vital for health and the environment, it is essential to regulate proper controlling mechanisms. Pollution modeling can act as a preliminary step of controlling mechanisms (Salnikov and Karatayev 2011). Generally, atmospheric pollution modeling (APM) is defined as the numerical tool that illustrates the casual relationships among emissions, meteorology, atmospheric concentrations, depositions, and other factors (Daly and Zannetti 2007). The APM techniques mainly categorized into three types, i.e., physical model, dispersion model, and machine learning model (Sportisse 2007). However, some other models are broadly implemented in atmospheric sciences, i.e., Gaussian models, Lagrangian models, and Eulerian Models. Commonly, these models work based on continuous emission records and conservation of mass (Gaussian models), the trajectory of air parcel, and wind data (Lagrangian models), and gridded atmospheric properties (Eulerian models). Apart from those, a prognostic model, namely, chemical transport model (CTM) (e.g., GEOS-Chem, CMAQ, WRF-Chem) processes emission, transport, and chemical conversion of trace gases and aerosols simultaneously with meteorological parameters. These models incorporate atmospheric science and multi-processing computational approaches, including the real-time updated emission inventory inputs, and meteorological records (Daly and Zannetti 2007). Unfortunately, the application of these models is further limited by some complexities in terms of geophysical characteristics, i.e., land use and terrain (Jiménez and Dudhia 2013). However, several recent studies argued that the traditional deterministic models struggle to capture the non-linearity among pollutants’ concentration, meteorology, land use, and emission and dispersion sources (Shimadera et al. 2016; Chen et al. 2017). To tackle and minimize the limitations of the models, however, machine learning algorithms seem promising (Rybarczyk and Zalakeviciute 2018).

The traditional statistical approaches are limited by describing the variables based on probability and statistical average. In contrast, machine learning models such as artificial neural network (ANN), support vector machine (SVM), and random forest regression (RFR) have been performed as the most popular classifiers to efficiently overcome the non-linear uncertainties and trends to accomplish better forecasting accuracy (Joharestani et al. 2019). Though the models do not unambiguously simulate the environmental process, in general, they exhibit better prognostic performance than the CTMs on spatiotemporal scale in the existence of extensive monitoring records (Marshall et al. 2008). Several studies have been conducted in different countries to evaluate the performance of machine learning models in the field of air quality modeling and forecasting (Kang et al. 2018). However, based on relevant literature, the study of machine learning in air pollution modeling was limited in Bangladesh, though multiple studies were performed to investigate the particulate pollution (Begum et al. 2011; Begum and Hopke 2018). The most used statistical technique to forecast air quality in Bangladesh was Seasonal ARIMA (Islam et al. 2020). Therefore, considering these observations, the study aims to investigate the application of machine learning models, i.e., ANN, L-SVM, M-SVM, Gaussian process regression (GPR), RFR, and a time series model namely PROPHET, on particle pollution modeling in four metropolitan cities in Bangladesh. Among them, ANN, L-SVM, and RFR were used in many air pollution research across the world. However, a limited study found in terms of investigating GPR, and M-SVM machine learning algorithms for pollution modeling (Rybarczyk and Zalakeviciute 2018). A new additive time series model PROPHET, developed by Facebook’s Core Data Science Team, was also implemented in this study to compare with the results of machine learning models. The rationale using this model was its specialty to forecast non-linear trends with yearly, weekly, seasonality, and holiday effects. Besides, the study will demonstrate the regional relationships between PM2.5 and PM10 concentration with meteorological parameters and the other air pollutants, i.e., nitrogen oxides (NOx), sulfur-di-oxide (SO2), carbon-mono-oxide (CO), and ozone (O3), which will be later considered as the exploratory variables to investigate the models.

Methodology

Air monitoring stations and data

This study used 24-h air quality (PM2.5, PM10, NOX, SO2, CO, O3) and meteorological data (mean temperature, rainfall, relative humidity, barometric pressure, wind speed, wind direction, and solar radiation) of four air monitoring stations, i.e., Dhaka, Chattogram, Rajshahi, and Sylhet, which were provided by the Department of Environment (DoE), Government of Bangladesh. Among them, Dhaka and Chattogram are ranked 19th and 76th having poor air quality, respectively, in the world (WHO 2018). The first air monitoring station (MS-1) in Dhaka was placed at 23.78° N and 90.36° E and characterized by heavy traffic and transportation. The MS-1 was positioned about 100 m away from the main road, and the height of the roof was approximately 7 m above the ground. The second air monitoring site (MS-2), Chattogram, was located at 22.32° N and 91.80° E. The sampling inlets of MS-2 were positioned on the flat roof of the monitoring site shelter, about 7 m above the ground. Unlike MS-1, the site was a residential area and, therefore, not much influenced by local sources. The location of the third monitoring site (MS-3), Rajshahi, was at 24.38° N and 88.61° E, which was approximately 3 km north from the downtown and 10 m away from a moderate traffic source. The sampler inlet was placed on a flat roof which was 5 m above the ground. The fourth and final monitoring site (MS-4) of this study was situated at Sylhet (24.89° N and 91.87° E). The location of MS-4 was 20 m far from the Kin Bridge of Sylhet and characterized with moderate traffic. The roof height was about 12 m above the ground. For every station, the intake nozzle of the sampler was placed 1.8 m above the roof with proper ventilation. To measure PM2.5 and PM10 concentrations, an automatic and real-time suspended particulate monitor (Beta Gauge 101 M; ENVIRONMENT SA, France) was used. The UV-fluorescence AF22M (TELEDYENE/API, USA), chemiluminescence gas analyzer AC32M, UV photometric ozone analyzer-42M, and dispersive infra-red carbon monoxide analyzer-12M (ENVIRONMENT SA, France) were utilized to measure the concentration of SO2, NOx, O3, and CO, respectively. To maintain quality assurance and control of the data, calibration was routinely performed. While processing, the data were checked for outliers and if 75% of the data in a day were not available for any parameter due to power failure or equipment’s nonoperational, values were considered as non-representative and excluded from the analysis. The amount of total captured data for MS-1, MS-2, MS-3, and MS-4 was 90.4%, 90%, 87.6%, and 86.5%, respectively, from January 2013 to June 2019.

Data pre-processing

The study performed data pre-processing to maintain the consistency of the raw dataset containing 2372 observations. To process the checking and removal of spatiotemporal outliers from raw data, the Z scores method was used before the calculation of statistical parameters, in consistency with previous studies (Barzeghar et al. 2020). Firstly, the series data were transformed into Z scores. The observations in the transformed series were rejected from the original series meeting the following conditions: (i) having absolute Z score is greater than 4 (|Zt| > 4), (ii) the increment from the previous value of the series is larger than 9 (ZtZt-1 > 9), and (iii) the ratio of the Z score value to its centered mean of order 3 (MA3) being greater than 2 (Zt/MA3(Zt) > 2). On the other hand, to correct the missing values, the study used the nearest neighbor method (NN) which aims to provide unbiased and valid estimates of associations based on information from the available data. It is also widely known as the standard method to handle missing values in many areas of research (Li et al. 2019). The algorithm is a similarity-based concept that relies on distance metrics. In this work, we used the Minkowski norm (D) given by Eq. (1) as a metric to evaluate distance in form of the Euclidean, when p = 2:

$$ \mathrm{D}={\left({\sum}_{i=1}^n{\left|{x}_i-{y}_i\right|}^p\right)}^{1/p} $$
(1)

where xi and yi are the test sample and training data, respectively.

Feature selection

A model-free method, i.e., Boruta algorithm (BA), was used in this study to identify the features for the models’ prediction. The overall procedure was conducted in six steps, i.e., (a) creating duplicate copies of predictors, i.e., meteorological parameters and pollutants; (b) performing the random shuffle original predictors and duplicate copies of predictors to remove their correlation with the outcomes; (c) applying RFA to find out the most important predictors based on the higher mean values; (d) estimating the Z score by using mean and standard deviation; (e) finding the Zmax score on duplicates predictors; (f) repeating the above steps till iteration completes for all the air monitoring stations separately.

PROPHET

PROPHET implements the decomposition of the time series with three main components which are seasonality, overall trends, and holidays (Papacharalampous and Tyralis 2018). The Eq. (2) is a result of those three components:

$$ y(t)=g(t)+s(t)+h(t)+{\in}_t $$
(2)

where g(t) = the trend function which models non-periodic changes in the value of the time series, s(t) = periodic changes (e.g., weekly and yearly seasonality), and h(t) = the effects of holidays which occur on potentially irregular schedules over one or more days; ∈t = any idiosyncratic changes which are not accommodated by the model.

Artificial neural network

ANN models are based on the interactions of neurons by transferring signals to another one along with weighted connections (Feng et al. 2015). Besides, in ANN model systems, each neuron is coupled with all preceding neurons and following the layers by links (Bai et al. 2016). In the input layer, every input value is regarded as a neuron. For the success of ANN, all input values should be weighted firstly, and then, the weighted values are processed into the hidden layers. In that layer, every neuron produces output values. The Eq. (3) calculates the outputs:

$$ o=f\left({\sum}_{j=0}^n{w}_{ji}{x}_j\right) $$
(3)

where f = non-linear function, xj = jth input to the neuron, vj = jth synaptic weight, and n = the number of inputs (Gomez-Sanchis et al. 2006). All data sets need to be normalized before the experimentation of the ANN model. This is the basic method of artificial studies. The equation of normalization (Eq. 4) is given below:

$$ {NI}_{ij}=\frac{I_{\left(i,j\right)}-\min (j)}{\max (j)-\min (j)} $$
(4)

where I = input value, NI = standardized value, i = number of patterns, and j = value of variables. In this study, multilayer perceptrons (MLP) was used as it is the most classical type of ANN. After experimenting on several MLP structures, the study decided to utilize two hidden layers.

Random forest regression

As the RF model is a supervised learning algorithm, three user-defined properties should be determined in the application of this model. They are the number of predictors that are used to make each tree (mtry), the number of trees in the forest (ntree), and the minimum number of terminal nodes (nodesize). Three user-defined parameters should be determined in RF modeling, which are the number of variables used to grow each tree (mtry) that creates the strength of the tress in the forest and establishes the correlations among them, the number of trees in the forest (ntree), and the minimum number of terminal nodes (nodesize) which should be set to fit an RF. The predictive performance of the RF model, however, is enhanced by increasing the tree strength and decreasing the number of correlations among trees (Brokamp et al. 2017). Firstly, the n number of training sample subset is expressed as D1, D2, ……, Dn from total training data set D using BS. Secondly, based on the subsets, n number of regression trees is created. Accordingly, n number of regression result is obtained. Finally, the optimal output is established by aggregating the results of each regression trees. In this study, we selected mtry = 5 and ntree = 500 using BS from the input data. Therefore, regression trees grew based on training data for each of one-fourth the total samples.

Gaussian process regression

Gaussian process models (GPM) are probabilistic and non-parametric in nature which generally works on the basic principles of Bayesian probability. In this study, exponential GPR was experimented. The task of GPR was to infer a mapping from a set of “D” dimensional regression vectors denoted by the regression matrix X = [x1, x2, ….., xn]T to a vector of output data y. This denotes:

$$ D={\left\{{x}_{ij},{y}_i\right\}}_{i=1}^N;{x}_i\epsilon \boldsymbol{X};{y}_i\epsilon \mathrm{\mathbb{R}} $$
(5)

The outputs (y1, y2, ……., yn) are usually assumed to be noisy realizations of the underlying function f (xi). A GP model assumes that the output is a realization of a GP with a joint probability density function:

$$ f=N\left(m,k\right);m:\boldsymbol{X}\to \mathrm{\mathbb{R}};k:C\left({x}_i,{x}_j\right) $$
(6)

where m = mean function and k = covariant function. Generally, the GP model assumes that the output is a realization of a GP (here, noted as N in Eq. 6) with a joint probability density function with the mean covariance being functions of the inputs. Usually, the m is defined as 0 and k defines the characteristics of the process to be modeled. To make the predictions, the study used the posterior and the marginal likelihood for selecting hyperparameters. The posterior predictive distribution is expressed as the following equation:

$$ p\left(\left.y\right|x\right)=\iint p\left(\left.y\right|f,\boldsymbol{X},\theta \right),p\left(\left.f\right|\theta \right),p\left(\theta \right), d f, d\theta $$
(7)

The interest of the study was the log marginal likelihood, as the quality of its approximation and the posterior approximation of the study was linked:

$$ l\left(\theta \right)=-\frac{1}{2}\log \left(\left|k\right|\right)-\frac{1}{2}{y}^T{k}^{-1}y-\frac{N}{2}\mathit{\log}\left(2\pi \right) $$
(8)

Support vector machine

In this study, the predicted concentration of particulate matter was followed by following SVM operated formula:

$$ {\hat{Y}}_0=\sum \limits_{i=1}^nK\left({X}_i,{X}_0\right)\left({\alpha}_i-{\alpha}_i^{\ast}\right);K\left({X}_i,{X}_0\right)=\exp {\left(-\left|{X}_i-{X}_j\right|\right)}^2/\mathrm{Y} $$
(9)

where αi and \( {\alpha}_i^{\ast } \)= support vectors and K(Xi, X0) = radial basis kernel function. The practice of an appropriate kernel function (KF) is one of the main features in SVM applications since SVMs are characterized by the usage of KF. It provides the capability of representing non-linear data in the input spaces that in essence are linear; then, an optimization procedure can be applied as in the linear case. This delivers a means to dimension the problem properly; however, the results still depend on the good selection of a set of training datasets. The Gaussian kernel function (GKF) is defined as-

$$ K\left({X}_i,{X}_0\right)=\exp \left(\frac{-{\left\Vert {x}_i-{x}_j\right\Vert}^2}{2{\sigma}^2}\right) $$
(10)

The GKF provides an estimate for the consistency of the forecast in the form of the variance of the predictive distribution and the analysis can be used to estimate the evidence in favor of a particular choice of the covariance function. The covariance or kernel function can be seen as a model of the data, thus providing a principled method for model selection (Singh et al. 2013). In this study, SVM in linear SVM (l-SVM) and medium Gaussian SVM (mG-SVM) was used.

Evaluation metrics

The performance of the models was evaluated based on the coefficient of determination (R2) (Eq. 12), root mean square error (RMSE) (Eq. 13), and mean absolute error (MAE) (Eq. 14)

$$ {\mathrm{R}}^2=\frac{\sum_{\mathrm{i}=1}^{\mathrm{n}}{\left({\mathrm{y}}_{\mathrm{i}}-{\mathrm{y}}_{\mathrm{mean}}\right)}^2}{\sum_{\mathrm{i}=1}^{\mathrm{n}}{\left({\mathrm{x}}_{\mathrm{i}}-{\mathrm{x}}_{\mathrm{mean}}\right)}^2} $$
(12)
$$ \mathrm{RMSE}=\sqrt{\frac{\sum_{\mathrm{i}=1}^{\mathrm{n}}\left({\left({\mathrm{x}}_{\mathrm{i}}-{\mathrm{y}}_{\mathrm{i}}\right)}^2\right.}{\mathrm{n}}} $$
(13)
$$ \mathrm{MAE}=\frac{1}{n}{\sum}_{i=1}^n\left|{x}_i-{y}_i\right| $$
(14)

where xi = ith observed value, (i = 1, 2, 3, ……, n), xmean = mean of observed value, yi = ith simulated value, and ymean = mean of simulated value.

Results and discussions

Data overview

The descriptive statistics for air pollutants and meteorological variables are presented in Table 1. Among the four stations, the highest annual mean concentrations of PM10 were observed in Dhaka (160.4 μg m−3) and the lowest was observed in Rajshahi (109.4 μg m−3). Like PM10, the highest concentration of PM2.5 belonged to Dhaka (90.5 μg m−3) and the lowest concentration was in Rajshahi (67.1 μg m−3). The annual mean concentrations of PM2.5 and PM10 in Chattogram were 70.6 μg m−3 and 133.2 μg m−3 which was the 2nd highest among the stations. The seasonal and annual patterns of PM2.5 and PM10 were illustrated with the comparison of WHO air quality standards in Fig. S6. It demonstrated that annual averages of the PM2.5 and PM10 concentration in the air of the Dhaka, Chattogram, Rajshahi, and Sylhet are greater than the standards of WHO. In Dhaka, it is about six times greater than the standard. Moreover, the annual PM concentration of stations surpassed the value of Bangladesh Air Quality Standard (BNAAQS). The standard value of annual PM2.5 and PM10 according to BNAAQS is 15 μg m−3 and 50 μg m−3 respectively (Table S5).

Table 1 Annual mean (μ) ± standard error (SE), and standard deviation (SD) of the air pollutants’ concentration and meteorological parameters across the four air monitoring stations in Bangladesh during 2013–2019

Seasonal variation of PM

The overall statistics of seasonal meteorological patterns across the stations are represented in supplementary Tables S1, S2, S3, and S4. It was cleared that the concentration of PM10 and PM2.5 across the stations was highest in winter whereas it was lowest in the monsoon season. The fluctuation pattern with the seasonal variation throughout the stations was almost the same. In winter, the highest mean concentration of PM2.5 and PM10 was observed in Dhaka (186.6 μg m−3 and 284.9 μg m−3 respectively) and the lowest in Sylhet (146.0 μg m−3) and Rajshahi (207.7 μg m−3) respectively. However, in monsoon, it was found the highest PM2.5 concentrations in Rajshahi (31.4 μg m−3) and lowest in Sylhet (26.4 μg m−3). From the above statistics, it is clear that there is a relation among the particulate matters and meteorological variables of the seasons. Like this study, Manju et al. (2018) found the similar relationship among the meterological parameters and air pollutants in India.The correlation among the meteorological parameters and the particulate matters of the study is illustrated in Fig. S4 and Fig. S5. Begum et al. (2011) revealed that brick kilns are responsible for the highest concentration of PM in Dhaka during winter as northwestern wind transports the PMs from the brick kilns located in Dhaka. However, a positive correlation was found between PMs and temperature during the monsoon season in Dhaka. It can be addressed by high summer temperatures after the rainfall in that season. Besides, the combined effect of high wind and temperature can accelerate the concentration (Kayes et al. 2019). Apart from the significant correlation with meteorological parameters, PM was also highly correlated with other gaseous air pollutants (Fig. S5). In Dhaka, PM2.5 was significantly correlated with PM10, CO, NOx, and SO2, especially during March, April, May, and June of the year. At that time, for PM2.5 and PM10, the highest correlation was found with CO because of the on-road traffic congestion. During the pre-monsoon and post-monsoon period, the SO2 was found highly correlated with NOx in every air monitoring stations. During that period, for the NOx emission, traffic was not the only significant source, but rather a substantial amount of NOx was emitted to the atmosphere from the main source of SO2 emissions. Begum et al. (2011) revealed that the main source of SO2 emissions in Dhaka is brick kilns.

Model execution

Before the model execution, data splitting was carried out by dividing data into two subsets, i.e., training data and testing data. The study used unsupervised 5-fold cross-validation for data splitting. In each fold, the dataset is separated into two training sets (75% of total data) and a remaining 25% as a hold-out test set utilized to evaluate the performance of the models after the training process. After dividing the data sets, the study used BA to select the most important variables before running the models. The results showed that both for PM2.5 and PM10, temperature, RH, BP, and WD were the most important predictors among the meteorological variables across the stations (Fig. S7). For PM10 prediction, the relative importance scores of the most significant meteorological predictor, Temp, in Dhaka (MS-1), Chattogram (MS-2), Rajshahi (MS-3) and Sylhet (MS-4) were 13.7, 13.8, 14.27, and 14.08, respectively. The next important meteorological variable was RH, for the stations (12.8, 12.9, 13.6, and 12.11 for MS-1, MS-2, and MS-3, respectively). However, unlike PM10, the most influential meteorological predictor for PM2.5 prediction in MS-1, MS-2, MS-3, and MS-4 was BP (15.12, 14.83, 14.7, and 14.74 for MS-1, MS-2, MS-3, and MS-4 respectively). On the other hand, among the chemical species fed into the models to predict PM2.5 and PM10, SO2 and NOx were the most influential exploratory variables. In terms of PM2.5 prediction, the relative importance scores of SO2 were 12.58, 12.55, 12.34, and 12.35 in MS-1, MS-2, MS-3, and MS-4 respectively.

PROPHET

The study used a time series model, i.e., PROPHET, to compare it with other machine learning models. Comparatively, PROPHET did not perform better than the machine learning models for the prediction of PM2.5 and PM10. In terms of PM2.5 prediction, the R2 values of the PROPHET for MS-1, MS-2, MS-3, and MS-4 were 0.72, 0.74, 0.78, and 0.75 respectively. It showed poor results in PM10 prediction also. In particular, the model performs worst in MS-1 for PM10 prediction. Samal et al. (2019) and Ye (2019) used this time series model for predicting air pollutants in India and China respectively. The performance of this model used in Samal et al. (2019) was better than this study. The RMSE value (= 3.54 μg m−3) of this model was much satisfactory for SPM simulation. Unlike this study, Ye (2019) found the RMSE values for PM2.5 and PM10 prediction were 10.34 μg m−3 and 15.5 μg m−3 respectively.

L-SVM and M-SVM

From Table 2, it is clear that both SVM models are comparable in terms of prediction metrics and they showed good performances in the prediction of daily mean PM2.5 and PM10 concentrations. In particular, M-SVM performed better than L-SVM. During the training period, M-SVM showed higher R2 and lower RMSE values than the L-SVM. For PM2.5 prediction of MS-1, MS-2, MS-3, and MS-4, the RMSE values of the M-SVM were 8.89 μg m−3, 10.6 μg m−3, 9.89 μg m−3, and 10.2 μg m−3, respectively, whereas for L-SVM, they were 8.57 μg m−3, 12.3 μg m−3, 10.7 μg m−3, and 10 μg m−3, respectively. Like PM2.5, M-SVM showed better performance for PM10 prediction. In terms of R2 value, for PM10 prediction, among the MS-1, MS-2, MS-3, and MS-4, the highest value was experimented in MS-3 (L-SVM = 0.93 and M-SVM = 0.94) for both models. The lowest MAE value (= 4.87 μg m−3) was found in MS-1 during the M-SVM execution. Over-fitting was controlled in this study during the model execution. Generally, over-fitting occurs when the results of testing are greater than the validation (Mehdipour et al. 2018). Singh et al. (2013) used an SVM model for predicting urban air quality in India where the RMSE value was 9.14 μgm−3 and 9.22 μgm−3 during testing and training period, respectively. However, in Tehran, Mehdipour et al. (2018) experimented much lower RMSE value (0.0501 μgm−3 and 0.519 μgm−3 during testing and training respectively) using SVM models to predict PM2.5 concentration.

Table 2 Performance metrics (R2, RMSE, and MAE) of the models during testing and training period. Here, MS-1, MS-2, MS-3, and MS-4 are representing Dhaka, Chattogram, Rajshahi, and Sylhet respectively. The values in italics indicate the best results among the models

ANN

For the MS-2 and MS-4, the best prediction result was obtained using ANN. To predict the PM2.5, the lowest RMSE value (= 9.42 μg m−3) and MAE (= 4.93 μg.m−3) were found in MS-2 during the test period. On the other hand, for PM10 prediction of MS-1, MS-2, MS-3, and MS-4, the RMSE values of the ANN were 13.8 μg m−3, 14.2 μg m−3, 14.7 μg m−3, and 14.9 μg m−3, respectively. Özdemir and Taner (2014) used multiple linear regression and ANN to predict the PM10 concentration in Turkey. The accuracy of the back-propagation feed-forward ANN (BPNN) with two hidden layers for the urban and industrial zone was 87% and 49% respectively. Feng et al. (2015) studied ANN with a trajectory model and wavelet model to improve the forecasting performance of PM2.5 in China. Using the ANN model particularly, they found RMSE values ranged from 28.63–36.78 μg m−3.

RFR

RFR performed the best results for MS-4 to predict PM10. During the testing period, RFR showed high R2, and lower RMSE, and MAE values than the L-SVM, M-SVM, and PROPHET. For PM2.5 prediction of MS-1, MS-2, MS-3, and MS-4, the RMSE values of the RFR were ranged from 9.2 to 11.6 μg m−3 and 9.59 to 11.9 μg m−3 during training and testing period respectively. Like PM2.5, the performance of RFR was better than the SVMs. In terms of R2, RMSE, and MAE value, for PM10 prediction, among the MS-1, MS-2, MS-3, and MS-4, the highest value was experimented in MS-3 during model execution (R2 = 0.91, RMSE = 13.7 μg m−3, MAE = 7.57 μg m−3). In China, Hu et al. (2017) experimented with RF and the R2 value was 0.80 on average. However, Joharestani et al. (2019) used 23 features including meteorological variables, geographic data, and ground measured concentration data to predict PM2.5 in Tehran. In that study, the R2 and RMSE values varied from 0.66 to 0.78 and 14.47 to 15.30 μg m−3, respectively.

GPR

Among the models in this study, GPR showed the best performance particularly for both PM2.5 and PM10 concentration for MS-1 (R2 = 0.91, RMSE = 7.68 μg m−3, MAE = 3.59 μg m−3 for PM2.5; R2 = 0.90, RMSE = 12.8 μg m−3, MAE = 7.62 μg m−3 for PM10) and MS-3 (R2 = 0.92, RMSE = 8.72 μg m−3, MAE = 4.17 μg m−3 for PM2.5; R2 = 0.91, RMSE = 12.1 μg m−3, MAE = 6.89 μg m−3 for PM10) both in training and testing period. During training, the R2, RMSE, and MAE values were ranged from 0.91 to 0.94, 7.68 to 11.3 μg m−3, and 3.59 to 6.87 μg m−3 for PM2.5, respectively, and 0.87 to 0.95, 12.5 to 11.2 μg m−3, and 6.76 to 7.61 μg m−3 for PM10, respectively. The worst performance of GPR was found for PM10 prediction in MS-2 (Table 2). A study in Tehran, i.e., Mehdipour et al. (2018), used Bayesian network to predict PM2.5 where the final RMSE value was 0.1077. Figure 1 represents the overall validation results of the models.

Fig. 1
figure 1

Taylor diagram of validation results of machine learning models

Model comparison and proposed model

Figure 2 represents the selection of the best model for the monitoring stations. The training results showed that the over-fitting was controlled perfectly in this study. From Fig. 1 and Table 2, it was clear that for PM2.5 and PM10, the PROPHET time series model performed worse than the machine learning models. However, GPR showed the best performance among all the models, particularly in MS-1 and MS-3. Therefore, GPR was selected as the best model for the prediction of both PM2.5 and PM10 in MS-1 and MS-3. In contrast, ANN well performed only for PM2.5 prediction in MS-2 and MS-4. Unlike PM2.5, RFR and M-SVM were selected for PM10 for MS-4 and MS-2 respectively. The results of the models are further compared in terms of exploratory variables. Initially, the study used only meteorological variables to predict particulate matters. The initial results of the model validation using only meteorological parameters are presented in the supplementary section (Table S6). Finally, when the chemical species such as NOX, SO2, CO, and O3 were fed into the machine learning models, the study found more meaningful results than before. The use of source pollutants in the models decreases the RMSE values for the models. Therefore, in terms of developing the machine learning models, the study recommends the use of more source pollutants and meteorological variables to reveal more fruitful results.

Fig. 2
figure 2

Proposed models for predicting and monitoring the concentration of PM2.5 and PM10 across the metropolitan areas in Bangladesh

Conclusion

Machine learning provides reliable forecasting of atmospheric pollution. This research, therefore, explores the application of ML models in the management of PM and air quality in Bangladesh. Five models, i.e., ANN, L-SVM, M-SVM, RFR, and GPR were used in fulfilling the purpose of the study with a comparison of time series model namely PROPHET. Among them, for Dhaka and Rajshahi, GPR showed the best results in terms of R2, RMSE, and MAE evaluation metrics. Therefore, the study recommended using GPR to predict the concentration of both PM2.5 and PM10 in those two stations. However, to predict the PM2.5 and PM10 concentration in Chattogram and Sylhet, the study referred to individual models. The proposed model for PM2.5 for Chattogram and Sylhet was ANN, whereas for PM10, the models were M-SVM and L-SVM respectively. However, the study recommends using data for a longer period to examine the performance of the models. Moreover, the hybrid models could be an option to compare it with these models. After all, the obtained results from this study revealed that the machine learning can offer convenient information that the government officials and policy makers of different countries can utilize it to issue early alerts of atmospheric pollution incidents and accordingly protect the citizens from exposure.