Introduction

The Air Pollution Index (API) describes the current state of air quality in a given region. The Department of Environment (DOE), Ministry of Environment and Water is one of the government agencies responsible for monitoring air quality at 68 stations in Malaysia. The API was then introduced to measure the cleanliness and efficiency of the air (Leong et al. 2020). The Malaysia Ambient Air Quality Guidelines (MAAQG) is used to determine the level of air quality in Malaysia and is used to measure the concentration levels of particles less than 2.5 μm (PM2.5), particles less than 10 μm (PM10), carbon monoxide (CO), sulphur dioxide (SO2), nitrogen dioxide (NO2) and ozone (O3). When the concentration level is above the level specified in the MAAQG for a long period of time, it will cause negative effects on health and the environment. The API in Malaysia is listed in Table 1 with its categorization as good, moderate, unhealthy and hazardous. Generally, PM10 is identified as a major pollutant that causes unhealthy conditions (DOE 2018). Therefore, PM10 is the main focus of this study.

Table 1 Malaysia Air Pollution Index

According to Azmi et al. (2010), the main causes of air pollution in Malaysia are either mobile sources from cars, buses and planes or stationary sources from power plants, open burning and wildfires, industrial facilities and others. The occurrence of haze in Malaysia is as a result of biomass burning since 1982 interrupting everyday life in Malaysia (Latif et al. 2018). Several haze episodes have been reported since then. These extreme episodes occurred in 1997, 2005 and 2015. Severe haze episodes were recorded in 1997 due to forest fires and large-scale plantations, especially in southern Sumatra and central Kalimantan, both in a neighbouring country, Indonesia. The city of Kuching, Sarawak located in East Malaysia was one of the areas affected by air pollution and haze in Sarawak East Malaysia in 1997. The Kuching API was recorded above 850 during the haze, the most alarming haze in Malaysia (Zakri et al. 2018). A further episode of extreme haze was reported in 2005 (Sahani et al. 2014) which was mainly on the Peninsula’s west coast of Malaysia. At that time, the smoke haze heavily affected the Klang Valley and its surrounding area. It reached its height at the haze emergencies on 11 August 2005, as the Air Pollution Index (API) reading in Port Klang and Kuala Selangor was recorded to be above 500. The latest extreme and long haze episode in Malaysia was reported in September 2015 (Huijnen et al. 2016). PM10 concentration is the most significant major pollutant released by human activity (Sapini et al. 2015). Specifically the PM10 concentration, in most cities of Southeast Asia (Reddington et al. 2014) and in Malaysia (Juneng et al. 2011), is justified as the main atmospheric pollutant. PM10 contributed most to Malaysia’s API until 2017. In mid-2017, PM2.5 had a greater impact on APIs in Malaysia until 2018 (DOE 2018).

There has been a growing interest in using many statistical models in the prediction of air pollution in recent years. One of these is regression techniques which have been used for a long time as predictive tools in many fields especially in the prediction of air pollution. The benefits of regression models are for its ease of use and efficient execution. However, these models are not very good in the prediction of complex situations, as the linear relationship between the selected parameters is determined (Abdullah et al. 2016). The statistical method is limited in clarifying the factors influencing PM10, due to statistical assumptions and the homogeneity of the data. Recent studies have attempted to develop powerful computing intelligence models using machine learning algorithms such as the neural network to predict the complex PM10 concentration system, which indicate that such models can easily predict the desired value (Abdullah et al. 2017). However, machine learning, more specifically the neural network, is usually used as a black box where there is no specific understanding of the physical characteristic of the technique (Viotti et al. 2002).

The boosted regression tree (BRT) model, another type of machine learning, which combines the advantages of regression trees with the boosted adaptive method, has recently been used in air pollution prediction studies. The boosting method was first developed by Friedman in (2001), and later added a stochastic aspect to the boosting algorithm through a random sample of the training data sets (Friedman 2002). In addition, it can also be used as a general method that is useful to improve the model accuracy of each learning algorithm. The BRT produces an ensemble model by boosting the loss function (such as root mean square error) of the user-defined number of additional trees by minimizing it. In contrast to the black box technique, the BRT method would evaluate the response of variables based on the individual model variable. It is therefore possible to determine, rank and describe the relationship between variables (Yahaya et al. 2019). The BRT is also capable of handling various types of inputs (i.e. categorical and continuous data) and accepts missing values (Motevalli et al. 2019) and able to deal with multiple forms of loss functions (Ridgeway 2012), such as Gaussian, Laplace, quantile regression (QR), Bernoulli and Poisson.

The loss function is one of the BRT model factor considerations. Ordinary least squares (OLS) loss function has been used by many studies, for the purpose of minimizing the squared error for continuous predictors, which resulted in a better correlation between the observed value and the estimation of the generalized boosting model (GBM) (Gu et al. 2019). However, datasets that have outliers such as air pollution data are not suitable to be used in OLS function. According to Kudryavtsev (2009), QR has become an important robust alternative tool, as it is more resistant to outliers and it is free function and does not have any properties.

The QR has the ability to be more useful and precise, since the non-central location of a distribution can be represented in all quantiles (Lingxin and Naiman 2007). The QR has the capability of including models for all quantiles, evaluating the entire function and calculating the central tendency (such as mean, median and mode) in the entire function of the variable of interest. The advantage of QR is for its robustness against non-OLS distribution which was found by Schlink et al. (2010). It can also be adapted to unbalanced observational frequencies. Due to this property, QR was considered and selected as a loss function strategy for this study.

The aim of this study is to derive air pollution modelling based on the loss function of QR using the BRT method. It is clear from the literature that no study has been conducted using such a method to predict PM10 concentrations. The finding from the proposed methodology is compared with the prediction obtained from the OLS loss function using the BRT method.

Methodology

The process of data preparation has been conducted in detail to reach for developing the model evaluation as illustrated in Fig. 1. The flow diagram is adapted to the author’s research and is reconstructed.

Fig. 1
figure 1

Building the prediction model workflow

Data preparation

Three urban sites were selected for this study. Table 2 shows the characterization of each station. All stations are located in the peninsular Malaysia. Alor Setar station (CA0040) is located in the northern region, Klang (CA0011) is located in the west coast region and Kota Bharu (CA022) is located in the east coast region as shown in Fig. 2. Data are operated by the Department of Environment’s continuous air quality monitoring (CAQM) stations in Malaysia. CAQM is an integrated ambient air quality monitoring device, is outfitted with a variety of ambient air analyses and sensors to identify particular pollutants. The analyses and sensors operate in a continuous mode, with data collected being captured on a microcomputer-based data acquisition system (DAS) that also controls the performance of the analyses and sensors. On an hourly basis, data is collected and transferred to a central computer for review and reporting. The United States Environmental Protection Agency (USEPA) has authorized the monitoring instruments and operational protocols of CAQM stations (Kamarul Zaman et al. 2017).

Table 2 Characteristics of monitoring station sites
Fig. 2
figure 2

Location of the monitoring sites (Ul-Saufie et al. 2012a, b)

For data exploration, a descriptive analysis is carried out to determine the existence of extreme values or missing values. Missing data is a problem commonly faced by researchers in environmental studies. Data discontinuities are a major obstacle to the prediction models that require continuous information for the majority of the parts to be used. The absence of any data prevents the ability to accurately conclude or interpret the observation (Noor et al. 2014). The missing data must be processed, because complete data are required to perform statistical analysis. This study used linear interpolation for missing data imputation. According to Noor et al. (2015), this linear interpolation method estimates the missing data better than that of the other methods.

Data pre-processing

Maximum daily data used in this study were furnished by the Department of Environment (DOE), Ministry of Environment and Water of Malaysia for the period of 2002 to 2017. The data for this project are confidential, but may be obtained with Data Use Agreements with the Department of Environment (DOE), Ministry of Environment and Water of Malaysia. The data was 80% randomly selected for training and another 20% for the validation of the model (80% for model development and 20% to evaluate the performance of the model). The variables used in this study consist of gaseous nitrogen dioxide (NO2; ppb), carbon monoxide (CO; ppb), sulphur dioxide (SO2; ppb), ozone concentration (O3; ppb), particulate matter concentration (PM10; μgm−3) and meteorological parameters such as wind speed (WS; km/h), relative humidity (RH; %) and temperature (T; °C), as the predictors used to predict PM10 concentrations 3 days ahead. All the selected parameters in this study have an influence on forecasts of PM10 concentrations for 3 days ahead, and had been used by previous researchers, as summarized in Table 3. The general models for this study are shown in Table 4.

Table 3 Variable selection by the previous studies in the prediction of PM10 concentration level
Table 4 General model of BRT

where

PM10,D + 1:

Next-day prediction of PM10 concentration

PM10,D + 2:

Next 2 days prediction of PM10 concentration

PM10,D + 3:

Next 3 days prediction of PM10 concentration

PM10,D:

Particulate matter (μg/m3)

COD:

Carbon monoxides (ppb)

NO2,D:

Nitrogen dioxide (ppb)

SO2,D:

Sulphur dioxide (ppb)

O3,D:

Ozone (ppb)

RHD:

Relative humidity (%)

TD:

Temperature (°C)

WSD:

Wind speed (km/h)

Model development

BRT is a method used to increase the accuracy of a single model by fitting a number of models and combining them for prediction purposes. BRT uses regression trees from the classification and regression tree (CART) and constructs boosts to combine model sets (Grunwald et al. 2020). In the BRT, there are several tuning parameters that need to be controlled such as the number of trees (nt), the learning rate (lr) which is the shrinkage parameter used in each iteration to reduce the contribution of the tree, the complexity of the tree (tc) or the interaction depth which is the maximum tree depth of variable interactions. This study fitted BRT models with varying values for nt (10,000), lr (0.01) and tc (5). In version 3.4.2 of the R software, the BRT model was fitted with version 1.6-3.1 of the GBM. The GBM offers three methods for estimating the optimum number of trees, i.e. the cross validation (CV), the independent test set (test) and the out-of-bag estimation (OOB).

This research used 10-fold cross validation as suggested by Ridgeway (2010) to get the optimum number of trees, and then, ten separate testing sets were averaged. Rather than worrying about the block being suitable for testing, CV employs them all, one at a time, and summarizes the results at the end. The independent test set (test) approach uses a single holdout base dataset to determine the optimum number of tree (Ridgeway 2007). This research used a 50% held out test set to find the optimum number of trees as suggested by Ridgeway (2017). Out-of-bag estimation (OOB) is used to evaluate the classifier. According to Martinez-Munoz and Suarez (2010), individual classifiers are trained in standard bagging on independent bootstrap samples extracted with replacement from the set of original data. In general, the size of these samples is chosen to align with the number of the original training dataset. This prescription is arbitrary and does not have to be optimal in terms of the ensemble’s generalization accuracy. The accuracy of the voting classifier is equal to the average of classifier. Bag.fraction 0.5 was used in this research, as suggested by Ridgeway (2020), to improve predictive performance while using the OOB method.

BRT constructs a model as a weighted sum of functions similar to other boosting algorithms. The BRT algorithm steps are summarized accordingly:

$$ {F}_0(x)=\arg \underset{\gamma }{\min}\sum \limits_{i=1}^nL\left({y}_i,\gamma \right) $$
(1)
$$ {F}_m(x)={F}_{m-1}(x)+{\gamma}_m{h}_m(x) $$
(2)

Start the model with a constant value F0(x).The BRT algorithm steps consist of a suitable decision tree and a loss function to determine how well a study is predicted. At each stage, the decision tree hm(x) is chosen to minimize the loss given the current model Fm − 1 and its fit Fm − 1(xi). The residuals ri, m are computed:

$$ {r}_{i,m}=-{\left[\frac{\partial L\left({y}_i,F\left({x}_i\right)\right)}{\partial F\left({x}_i\right)}\right]}_{F(x)={F}_{m-1}(x)} $$
(3)

ri, m is the negative gradient of the ith sample in the mth as the number of trees. hm(x) is set to use the ri, m as the target variable. Fit a regression tree to the residual ri, m values and create the leaf node area Rj, m for j = 1, 2, …, J. The weights are obtained by solving the problem of minimization:

$$ {\gamma}_{jm}=\underset{\gamma }{\mathrm{argmin}}\sum \limits_{i=1}^nL\left({y}_i,{F}_{m-1}\left({x}_i\right)+\gamma \right) $$
(4)

The square error is the loss function for the deterministic prediction:

$$ L\left({y}_i,F\left({x}_i\right)\right)=\frac{1}{2}{\left({y}_i-F\left({x}_i\right)\right)}^2 $$
(5)

For quantile regression, the expression below is used when the α (quantile) value is in range 0 to 1.

$$ \mathrm{L}\left({\mathrm{y}}_{\mathrm{i}},\mathrm{F}\left({\mathrm{x}}_{\mathrm{i}}\right),\upalpha \right)=\left\{\begin{array}{c}\upalpha \left({\mathrm{y}}_{\mathrm{i}}-\mathrm{F}\left({\mathrm{x}}_{\mathrm{i}}\right)\right),\kern0.5em \mathrm{if}\kern0.5em {\mathrm{y}}_{\mathrm{i}}\ge \mathrm{F}\left({\mathrm{x}}_{\mathrm{i}}\right)\\ {}\begin{array}{cc}\left(\upalpha -1\right)\left({\mathrm{y}}_{\mathrm{i}}-\mathrm{F}\left({\mathrm{x}}_{\mathrm{i}}\right)\right),& \mathrm{otherwise}\end{array}\end{array}\right. $$
(6)

Rj, m is a leaf node, the jth being the number of leaf in the tree and υ is a learning rate. Update the current model:

$$ {F}_m(x)={F}_{m-1}(x)+\upsilon \sum \limits_{j=1}^J{\gamma}_{jm}I\left(x\in {R}_{jm}\right) $$
(7)

It is a method of looping that fits the regression tree. Then, once the first tree is added to the model, tree error prediction will be taken into account to balance and boost the accuracy of the next tree.

$$ {F}_m(x)=\sum \limits_{m=1}^M\sum \limits_{j=1}^J{\upsilon \gamma}_{jm}I\left(x\in {R}_{jm}\right) $$
(8)

Model evaluation

Performance indicators in this research work are used to determine the accuracy and errors of BRT with different loss function (OLS and QR). The indicators used to identify the best method for the prediction of PM10 concentration were the root mean square error (RMSE), normalized absolute error (NAE), predictive accuracy (PA), agreement index (IA) and coefficient of determination (R2). The RMSE and NAE were used to find a model error where a value closer to 0 demonstrated a better model. Meanwhile, the other three performance indicators, i.e. IA, PA and R2, were used to verify the accuracy of the model outcome, where a higher accuracy is given by a value closer to 1. The equations displayed in Table 5 have been indicated by Ul-Saufie et al. (2015).

Table 5 Performance indicator

N = Number of sample hourly measurement of a selected sites.

Pi = Predicted values of hourly data.

Oi = Observed values of hourly.

\( \overline{O} \) = Mean of the observed values of hourly data.

\( \overline{\mathrm{P}} \) = Mean of the predicted values of hourly data.

Results and discussion

The descriptive statistics and box plots for maximum daily PM10 concentrations in Alor Setar, Klang and Kota Bharu from 2002 to 2017 are shown in Fig. 3. Concentrations of PM10 were very high in Klang, Selangor with maximum concentrations 643 μg/m3 over the threshold limit of 150 μg/m3, followed by Alor Setar (385 μg/m3) and Kota Bharu (198 μg/m3). This relates to the fact that Klang is the 13th busiest shipping port and the 16th busiest port in the world. Klang is one of the densely populated and developed areas in Malaysia as there are many industries and business activities in Port Klang. Alor Setar, Klang and Kota Bharu witnessed high particulate events as well as extreme events that promote the increase in PM10 concentrations since the skewness value for Alor Setar (4.03), Klang (4.89) and Kota Bharu (1.72). The distribution is highly skewed, as described in Shaziayani et al. (2018), if the skewness is less than − 1 or greater than + 1. Box plot shows that Alor Setar experienced the highest PM10 concentration in 2016. According to the DOE, this condition is affected by land and forest fires in Sumatra Central, Indonesia, carried by the Southwest Monsoon winds. Klang reached the highest PM10 level during the haze emergency declared on 11 August 2005 as the Air Pollution Index (API) exceeded 500. Due to massive land and forest fires in Sumatra and Kalimantan, Indonesia, Kota Bharu had suffered degradation in air quality during Southwest Monsoon from August to September 2015.

Fig. 3
figure 3

Descriptive statistics and box plots for maximum daily PM10 concentration

The MAAQG control values for CO, NO2, O3, PM10 and SO2 are 8750 ppb (8-h mean reading), 40 ppb (24-h mean reading), 60 ppb (8-h mean reading), 50 μg/m3 (24-h mean reading) and 40 ppb (24-h mean reading). The analysed data for Alor Setar, such as mean, median, standard deviation, skewness, kurtosis and maximum data, are listed in Table 6. The mean values for all five air pollutants which are PM10 (41.99 μg/m3), O3 (34.27 ppb), CO (560.30 ppb), NO2 (15.20 ppb) and SO2 (1.05 ppb) indicate that the average concentration in Alor Setar for 16 years was below the Malaysia Ambient Air Quality Guidelines (MAAQG) for the period from 2002 to 2017. Furthermore, the mean values for meteorological parameters are represented by RH (89.35%), T (32.42 °C) and WS (10.53 km/h). Skewness shows positive values for all air pollutant values. The highest positive skewness value for CO, NO2, O3, PM10 and SO2 is 1.71, 1.10, 0.82, 4.03 and 0.99 indicating the existence of extreme events.

Table 6 Descriptive statistics for Alor Setar, Kedah

Table 7 gives the summary of the descriptive statistics for all parameters’ maximum daily data of Klang for 2002 to 2017. The mean values for the area in 16 years are higher than their respective median which indicates that the pollutant distributions are positively skewed (also called right-skewed). The maximum value for air pollutants was PM10 643 μg/m3, O3 127ppb, CO 10,500 ppb, NO2 128 ppb and SO2 150 ppb. Klang has the highest mean and median values compared to other locations. This may be due to the fact that extensive industry operates in Port Klang, the most densely populated and developed region in Malaysia (AL-Dhurafi et al. 2017, 2018). It has the smallest standard deviation, despite the highest central tendency value, indicating that this area has continuously encountered very high concentrations.

Table 7 Descriptive statistics for Klang, Selangor

Table 8 demonstrates the result of the descriptive analysis of air pollutant concentration and meteorological parameter for Kota Bharu, Kelantan. The mean values for PM10 (48.73 μg/m3), O3 (29.21 ppb), CO (926.26 ppb) and NO2 (15.15 ppb) were higher than the median value. Therefore, the distributions of these measurements were skewed to the right, indicating that there were several observations of high concentration of air pollutant occurred in the years 2002–2017. Meanwhile, the mean value for RH (91.86%) and T (31.36 °C) was lower than the median value which indicates the distribution of data was skewed to the left. These results show that the weather in Kota Bharu is mainly hot and dry, which means that the observation of humidity this year seems to be less humid.

Table 8 Descriptive statistics for Kota Bharu, Kelantan

The relative influence (RI) was computed to identify the strength of each predictor-response variable relationship. According to Sayegh et al. (2016), the BRT modelling technique can be used to identify the influence of different predictors on response variable. The most important predictor identified for the maximum daily PM10 concentration for the next day (D + 1) was PM10 concentration for the previous day, where Alor Setar has 90.17%, Kota Bharu 59.72% and Klang 54.68%. PM10 concentration for the previous day played a remarkable role in explaining more than 50% of the variance in the BRT model. The least important predictor was found to be SO2, where Alor Setar has 0.30%, Kota Bharu 2.77% and Klang 3.02% (Fig. 4).

Fig. 4
figure 4

Relative influence of the selected predictors

The BRT models using OLS loss function and compared test, 10-fold CV and OOB methods are shown in Table 9. Performance indicator has been used to assess the accuracy of the fit to the BRT model in order to determine which method better predicts PM10 concentration in Alor Setar, Klang and Kota Bharu for the 3 days ahead. This study predicts up to 3 days ahead because, according to Perimula (2012), the government will be able to announce warning status if the API exceeds 101 for more than 72 h.

Table 9 Comparison method of best iteration (OLS)

The best OLS loss function in BRT models with the lowest total ranking is shown in Table 10. For error measurements, the values are ranked from the smallest (rank = 1) to the largest (rank = 3), and for accuracy measurements, the values are ranked from the largest (rank = 1) to the smallest (rank = 3). The total ranking has been determined. This procedure was repeated until the next 3-day (D + 3) prediction to decide the best BRT models for the three stations in this study.

Table 10 Ranking of performance indicators for the BRT model to predict D + 1 PM10 concentration

The results show that for the next-day prediction independent test set is better than OOB and CV for all sites. The coefficient of determination (R2) for Alor Setar, Klang and Kota Bharu was 0.70, 0.60 and 0.65, respectively, while the RMSE value was 10.35, 22.13 and 10.27, respectively. The R2 values between the fitted model data and the data set were found to be more than 0.5, suggesting that the model is appropriate and good for the next day’s prediction by using an independent test set. The R2 between the observations and the fitted model obtained from this study indicates how well the BRT model fits.

A comparison among the performances of the lowest error (NAE and RMSE) value and comparable IA, PA and R2 values as for Alor Setar (independent test set), Klang (CV) and Kota Bharu (OOB) indicates that the best method for each site is different for the second-day prediction.

However, the next 3-day prediction suggests that the CV is the best method for Alor Setar and Klang, but for Kota Bharu independent test set is the best method which predicts PM10 concentration. Overall, the model’s performance verified that the next-day prediction is better than the next 2-day and next 3-day prediction.

Descriptive analysis shows that the data for this study is non-central condition because it contains outlier; therefore, this study uses quantile regression as explained by Kudryavtsev (2009). Performance indicators have been used to identify the best quantile to predict the next-day (D + 1) PM10 concentration at Alor Setar as summarized in Table 11. Of the five performance indicators used, NAE and IA indicate that 0.5 quantile gave better fit than other quantiles, but the valley differed by just 0.01 with 0.55 quantile. However, RMSE, PA and R2 have shown that 0.55 quantile is the best quantile in PM10 concentration models. 0.55 quantile was therefore used to predict the PM10 concentration models for the OOB method. For CV and OOB methods, the presented results demonstrate that 0.5 gave better fit than other quantiles.

Table 11 Performance indicators for PM10 concentration prediction (D + 1)

After choosing the right quantile to present the best PM10 concentration prediction models for the next day, repeat the same process for finding the best quantile for the next 2-day and next 3-day prediction for all selected locations.

The best quantile to predict the next 2-day (D + 2) PM10 concentration at Alor Setar is reported in Table 12. Results show that 0.5 is the best quantile for CV and Test method for the next 2-day prediction, while for OOB is 0.4. The chosen quantile for the next 3-day at Alor Setar is described in Table 13. The findings revealed that all methods (OOB, CV and Test) have the same result, which is 0.55 as the best quantile.

Table 12 Performance indicators for PM10 concentration prediction (D + 2)
Table 13 Performance indicators for PM10 concentration prediction (D + 3)

After selecting the best weighting for OOB, CV and Test, the next step is to determine the best method for the next-day, the next 2-day and the next 3-day prediction. The best weighting function was identified for the next day, the next 2 days and the next 3 days in Table 14 for all three monitoring stations by repeating the same procedure for the proposed PM10 concentration prediction method.

Table 14 Comparing the result between quantile regression

The best prediction model for next-day PM10 concentration in Alor Setar is OOB (quantile = 0.55) with an error of 0.1464 (NAE) and 9.3260 (RMSE), with an accuracy of 0.9177 (IA), 0.8546 (PA) and 0.7291 (R2). For Klang and Kota Bharu, CV (quantile 0.5) is the best method. The CV and Test models were selected to predict the PM10 concentration for the next 2-day while for the next 3-day only Alor Setar shows that OOB (quantile = 0.55) is the best method with performance indicators 0.2463 (NAE), 14.4598 (RMSE), 0.6496 (IA), 0.5553 (PA) and 0.3078 (R2). Overall, the results showed that quantile values of 0.5, 0.55 and 0.6 obtained the best quantile results when combined with the BRT method.

The best loss function representing each monitoring station can be identified according to the results of the performance indicator in Table 15. Of the five performance indicators applied, all sites indicate that QR was slightly better than OLS. This is supported by Khan et al. (2019), which states that QR can be utilized for the prediction of extreme events.

Table 15 Comparing the best performance of statistical models for predicting PM10 concentration

Norazrin et al. (2018) investigated the Bayesian regression model using conjugate prior distribution and get the results for RMSE (4.66 to 9.88), IA (0.900 to 0.929), PA(0.830 to 0.866) and R2 (0.614 to 0.665). While, Park et al. (2018) predicted PM10 concentration in Seoul metropolitan subway stations using artificial neural network (ANN) model and presented R2 of 0.39 to 0.81. On the other hand, Abdullah et al. (2020) showed the results from performance error RMSE (126.73–164.98) and NAE (0.33–0.43) by using multiple linear regression for PM10 forecasting during episodic trans-boundary haze event in Malaysia. In addition, Shaziayani et al. (2018) reported that feed forward back propagation performs better than general regression neural network in Seberang Jaya, Pulau Pinang with an IA of as much as 0.7796 for the next day, 0.6033 for the next 2-day and 0.8024 for the next 3-day predictions.

Overall, this implies that the values of performance indicators of this study are almost the same as those of previous researchers. This paper shows that alpha 0.5, 0.55 and 0.60 are the best quantile as recommended by Ul-Saufie et al. (2012), which is appropriate for data on air pollution in Malaysia. Therefore, the proposed model can be used as an alternative method to predict the concentration of PM10 in Malaysia.

Figure 5 shows the comparison between the observed value and predicted value of Alor Setar, Kota Bharu and Klang for the validate data set. The optimum setting value from the training data set is tuned with the number of learning rate at 0.01 and iteration at 10,000. By using the optimum value found in the training process, the accuracy of this BRT prediction is found to be 60.33 to 91.77%.

Fig. 5
figure 5

The observed and predicted maximum daily PM10 concentration

Conclusion

Overall, these results indicate that the quantile regression has fulfilled the assumptions and the good model for BRT for predicting maximum daily PM10 concentration. The study findings show that the values of NAE (0.15–0.17), RMSE (9.33–22.25), R2 (0.60–0.73), IA (0.85–0.92) and PA (0.78–0.85) were good for the next-day predictions. Most of the results used 0.5 as the best quantile which represents the median data, but 0.55 and 0.6 had also been chosen as the best quantile because the model has more number of outliers compare to the other models. Overall, the results showed that the number of quantile is greater than the median value (0.5). In conclusion, QR is an alternative loss function for BRT to predict the 3 days ahead of PM10 concentration for all sites and suitable for data containing influence outlier. This model can help local authority to take action to reduce the effect of haze in Malaysia.