Introduction

India is the second most dense and populated country in the world and one of the fastest growing economies. It is experiencing extreme congestion problems on road specifically on undivided two lane highways with mixed traffic. Facilitating infrastructure, imposing proper taxes to restrict personal vehicle growth and enhancing public transport facilities are long term solutions to this problem. These permanent solutions need government’s involvement. The Indian government has spent a huge amount in the urban infrastructure sector. Many public transports like Bus Rapid Transit, Metro are being built in several new places to encourage the use of public transport. However, still there is a rapid growth of private vehicles [1]. The country’s growing population is also one of the reasons for increased transportation needs. Meeting such needs with infrastructure growth is seemingly less viable because of space and cost constraints.

In general, short term traffic forecasting deals with analyzing previous traffic data and predict the traffic flow for next 5–30 min. The duration of 5–30 min is very important because traffic can grow drastically sometimes in next 5 min and can be very congested in next few minutes. Therefore, if it is possible to predict the traffic situation in next 5 min then suitable actions can be taken by traffic engineers and police to divert the traffic on other routes in case of the possibility of high congestion in next few minutes. Hence, short term traffic forecasting is very important area for investigation specifically for two-lane undivided highways with mixed traffic in India. Researchers in India have considered 4 lane divided highways for short term traffic prediction but two-lane undivided highways with mixed traffic is still requires extensive investigation and research. This is the motivation for the present study. Therefore, this study identified a two-lane undivided highway stretch with mixed traffic conditions and developed a short term traffic forecasting model using back-propagation neural network. The rest of the paper is organized as follows: “Literature review” section provides s literature review of relevant studies. In “Materials and methods” section, materials and methods is discussed that consists of information related to data collection, preparation methods and the back propagation neural network approach used to develop short term traffic forecasting model. “Results and discussion” section presents the results and discussion. Finally, the study is concluded in “Conclusion” section.

Literature review

Intelligent management of traffic flow and with providing travelers more accurate information about traffic and road status can reduce the negative impact of congestion [2]. Developing intelligent transport system (ITS) in India requires extensive quality research and development efforts. In design, planning and operations of highways; the traffic flow forecasting is very important. Short term traffic [3] mentioned that short term traffic flow forecasting is an important aspect of ITS. The importance of traffic flow forecasting for ITS has long been seen in many applications including the development of traffic control strategies in advanced traffic management systems [4] and Advanced Traveller Information Systems [5]. Furthermore, traffic forecasting can be very useful for drivers in saving time and also it helps in reducing traffic congestion and air pollution. In order to predict traffic parameters, the modeling process requires past records.

There are several studies that considered short term traffic flow prediction and developed various methodologies. Kalman filtering [6], local linear regression [7], neural network [8] and fuzzy logic based models [9] are some of the methods used for the short term traffic flow prediction. Due to stochastic and highly non-linear behavior of traffic stream,  machine learning techniques [10] have received a great attention and hence are taken as an alternative for traffic flow prediction. Dougherty and Cobbett [11] used back propagation neural network to develop a model to predict traffic flow, speed and traffic occupancy in the Utrecht/Rotterdam/Hague region of The Netherlands. They mentioned that elasticity test can be a good option to interpret the developed neural network model. A comparative study between neural network and statistical models for short term traffic flow forecasting on motorway traffic data in France was done by Kirby et al. [12]. Dia [13] proposed an object oriented neural network approach for the prediction of short term traffic conditions on a highway stretch between Brisbane and the Gold Coast in Queensland, Australia. Wang and Shi [14] used support vector machines (SVM) model for short term traffic prediction. They suggested that appropriate selection of kernel parameters for SVM is a big challenge. They introduced a new approach to construct a new kernel function with the use of wavelet theory to capture the non-stationary properties of short term traffic speed data. Further, they tested this approach in a real world traffic speed data.

Theja and Vanajakshi [15] considered a mixed and less-lane disciplined traffic data with homogeneous traffic flow on Indian road. They used SVM and back propagation ANN to develop a traffic prediction model. They mentioned that SVM method was found more accurate in their study. In a different study, Centiner et al. [16] considered the homogeneous traffic flow and used ANN model to develop a short term traffic forecasting model on the traffic data collected in Istanbul. They mentioned that day of week, hour and minute had played an important role in traffic volume prediction. The stability and efficiency of neural network for short term prediction of traffic volume with mixed Indian traffic flow conditions on 4-lane undivided highways were studied by Kumar et al. [17]. Kumar et al. [17] considered ANN model for traffic flow forecasting and used traffic volume, speed, traffic density, time and day of week as input parameters. They mentioned that performance of ANN was consistent even if they changed the prediction time interval from 5 min to 15 min. [18] used adaptive Kalman filter approach for short term traffic flow rate prediction and uncertainty quantification. They developed a short term traffic prediction model for the real world traffic data collected from four different highway systems from United Kingdom, Minnesota, Washington and Maryland from USA. They suggested that adaptive Kalman filter is highly effective in case of highly volatile traffic.

Habtemichael and Cetin [19] proposed a non-parametric prediction model using enhanced k-nearest neighbors approach for short term traffic flow rate prediction. They applied and tested this model on 36 datasets (12 datasets from United Kingdom and 24 datasets from USA) collected from different regions. They found that their model outperformed the other advanced parametric models used in the study. Further, Ma et al. [20] pointed out that accuracy is very important in short term traffic flow prediction. They proposed a 2-dimensional prediction method using Kalman filtering for historic traffic data. They mentioned that their proposed approach provided better accuracy than standard Kalman filtering approach. Guo et al. [21] suggested that interval prediction is more important and challenging than point prediction for traffic managers considering the future scenario of ITS. They used fuzzy information granulation method along with ANN, SVM and KNN methods to develop a forecasting model for both point and interval prediction on a real world traffic data collected from American field transportation systems. Their results showed that with an increase in time interval, stability of prediction systems increased.

Such studies are possible with high quality data available with high-tech technology used in the ITS in European countries and United States where people are disciplined towards traffic rules. Considering India, a different scenario can be seen where all roads are not well constructed and people are not very friendly and disciplined towards traffic rules. A two lane undivided road is a global feature for any state and national highways in India. In India, very few studies [22, 23] have taken two lane undivided highways into consideration but there motivation was other than short term traffic flow forecasting. Therefore, it is very important to consider two lane undivided highway stretch with mixed traffic flow into consideration because a large portion of Indian road network are two lane undivided roads with mixed traffic.

The key objective of this study is to develop a short term traffic flow forecasting model using back propagation neural network for non-urban undivided two-lane roads with mixed traffic flow conditions in India.

Materials and methods

Data collection and preprocessing

The data set used in the study was collected from 2-lane undivided highway stretch between Roorkee and Hardwar on National Highway-58 (NH-58). In the NH-58, Delhi to Muzaffarnagar road is constructed as four-lane divided national highway and remaining road is an undivided two-lane. In the present study, undivided two lane highway stretch on NH-58, from Roorkee to Hardwar is selected. The three locations, L1 (near hotel Prakash, Roorkee), L2 (near Rehmadpur) and L3 (near Badheri) shown in Fig. 1 were considered for data collection.

Fig. 1
figure 1

The selected highway stretch on NH-58 (Roorkee to Hardwar Road)

Data were collected during 900–1200 and 1500–1800 h from 1/4/2017 to 31/8/2017. High quality digital cameras were used to effectively capture the traffic on entire selected stretch. Traffic flow was assumed to be simple having no change of direction. All the data were captured with a timer effect. These recordings were played in the computer and features were extracted with the help of a computer program written in python language. Further, all the vehicles were classified into ten categories (Table 1). The number of vehicles was counted, passing through a trap length manually to obtain the traffic volume data. The speed of each vehicle in all categories was calculated by dividing the trap length (20 m) by entry and exit time difference crossing the trap length.

Table 1 Summary of speed and traffic volume measurement for 5 min interval

The measured data has been used in the same fashion mentioned by de Luca et al. [24] . Data extraction was done in the intervals of 5 min for both directions. Statistical characteristics of extracted data are mentioned in Table 1. Since data were collected at the same day same location at two peak times (morning and late afternoon) for the period of three hours for all days, approximately 100 data samples were obtained on each day by considering both sides of the road compositely. Therefore, a total of 15,069 data samples were obtained for the entire duration.

In this study, 22 parameters were taken into consideration to create database for ANN modeling of traffic volume that includes the frequencies of all category of vehicles. For pre-processing of the dataset, the whole exemplars (dataset) were first randomized and then divided into three data sets. First dataset was taken as the training set, second dataset for cross validation and third dataset was used for testing purpose. Randomization is used to stop bias in the dataset and create different samples as a representative of the entire population. By dividing the dataset, 10% of the samples were used for cross-validation, 10% for testing and 80% were used for training purposes. The criterion in separating the data was to assign sufficient samples for the ANN training and some for cross validation and testing.

Model development

ANN has the potential to perceive the non-linear relationship between input and output features and can provide generalize solutions to forecast traffic volume. Multi-Layer Perceptron (MLP) is one of the popular network structure of ANN with an additional layer called hidden layer. MLP can be used to solve different problems because of non-linear characteristic of activation function between its layers of processing elements. The selection of activation function plays a critical role in the performance of a neural network. The error is calculated at each epoch by comparing computed output of each input with expected output. Back propagation is widely used technique to propagate the error. Each processing unit is initially assigned a random weight. The main objective of neural network optimization process is to minimize the mean square error in training, cross validation and testing phase.

There is no limitation on selection of number of input variables in ANN modeling. The selection of number of input and output variables depends on the type of problem. In literature, there is no general approach for creation of perfect neural network architecture. Trial and error simply means that initially we have to decide the weight parameters for each neuron in the hidden layer at random. Further, these weights are modified by propagating the prediction error backwards. Certain parameters like number of input variables, number of hidden layers, activation or transfer function and learning rate plays an important role in designing neural network architecture. The general architecture of ANN is illustrated in Fig. 2.

Fig. 2
figure 2

General architecture of Artificial Neural Network

Previously, ANN model was used by Kumar et al. [17] for the short term traffic flow predictions for 4 lane highway. Here, we are trying to use ANN model to investigate its performance for short term traffic prediction on 2 lane undivided highway with mixed traffic conditions.

An ANN with one hidden layer can be defined as a function

$$y : Z^{A} \to Z^{B}$$
(1)

where, A and B are the length of input and output vector f(x), respectively.

In matrix form, it can be defined as:

$$y\left( x \right) = \varphi \left( {b^{\left( 2 \right)} + W^{\left( 2 \right)} h\left( x \right)} \right)$$
(2)

where,

$$h\left( x \right) = \delta \left( {b^{\left( 1 \right)} + W^{\left( 1 \right)} x} \right)$$
(3)

b(1), b(2) are bias vectors, W(1), W(2) are weight matrices and \(\varphi\) and \(\delta\) are activation functions.

Some of the popular activation functions are sigmoid (x) and tanh (x) used in this study. Equations 4 and 5 provide the mathematical notation and Fig. 3a, b provide graphical illustration for tanh (x) and sigmoid (x) respectively.

$$y\left( {x_{i} } \right) = \tanh \left( {x_{i} } \right)$$
(4)
$$y\left( {x_{i} } \right) = \frac{1}{{\left( {1 + e^{{ - x_{i} }} } \right)}}$$
(5)
Fig. 3
figure 3

Activation functions

The above two activations functions are both sigmoid except that first one is hyperbolic tangent in the range [− 1, + 1] and second one is a logistic function within range [0, 1].

Initially, random weights are assigned to the hidden layer because it is very difficult to identify the accurate weight parameters. Therefore, a loss function is required to adjust the weight parameters accurately. This loss function (Eq. 6) calculates the error between predicted output and exact output and then propagates the error backwards.

$$L_{f} = \frac{1}{2}\sum (y - y^{\prime})^{2}$$
(6)

\(y^{\prime}\) is the predicted output and y is the actual output. The goal is to minimize the loss function by changing the weight matrix. The weight matrix can be changed using gradient decent method. It tries to find the rate of change of error for a specific weight in the error. Weight matrix can be updated using weight update equation as given in Eq. 7.

$$W_{ab} = W_{ab} - \Delta W_{ab}$$
(7)

where, \(W_{ab}\) represents the connection weight from a neuron in layer a to layer b and \(\Delta W\) is given by,

$$\Delta W = - \theta \frac{{\partial L_{f} }}{{\partial W_{ab} }}$$
(8)

\(\theta\) is the rate of learning.

Development of ANN models

In this study, multilayer perceptron network has been used for the prediction of traffic flow for 5 min in future using past 55 min data. For development of ANN model, 216 data samples have been taken, each of which contained 22 features i.e. location, time of day, 10 vehicles categories and respective average speed of each vehicle category. Single class speed flow model is not sufficient for explaining traffic conditions in India because we do not have single road just for one type of vehicle. Moreover, all kind of vehicles (motorized and non-motorized) share the same road and there is a variation in their speed. Therefore, a single class speed flow model is not applicable in this study. Therefore average speed of different class of vehicles is considered to predict the multiclass traffic flow of undivided two lane highway. The development and implementation of the ANN model was done Anaconda Spyder 3.6 version using Scikit Learn package. The best performing neural network structure is obtained by getting the best values of network parameters for the training and the testing. Due to failure of getting appropriate values of network parameters by using other approaches available in literature [25,26,27] [S, T, U], the trial and error approach has been used. The stopping criterion during training was the least mean square error (MSE).

Twelve different ANN models have been developed to train on the dataset. The specification of 12 models with different structures has been presented in Table 2. From the Table 2 it is clear that neural network with 7 hidden neurons gives the best prediction result. Thus, architecture of ANN model in the present study has 22 inputs, 7 neurons in hidden layer and single output. The performance of the ANN models were determined using cross validation and testing data sets. Coefficient of correlation (r), mean absolute error (MAE), mean square error (MSE) and Normalized mean square error (NMSE) were used to evaluate the performance of predicted results.

Table 2 Different ANN networks architectures for traffic volume prediction

Sensitivity analysis of traffic volume parameter to input

Sensitivity analysis measures the variation in the performance of model with a change in input value [28, 29]. Irrelevant inputs can be eliminated by implementing sensitivity analysis on a trained network. Eliminating irrelevant inputs may result in reduced data collection cost and improved network’s performance. Moreover, sensitivity investigation gives understanding of the fundamental relations between input variables and output. In this investigation, ANN model (Model 12) was applied for sensitivity investigation. Sensitivity investigation was achieved about the mean on the pre-trained MLP network. This batch starts by changing first input between its mean ± 5 while all other inputs were stable at their respective means. The network output was calculated for hundred steps above and below the mean. This procedure was then done repeatedly for each input. Figure 4 shows the deviation of output with respect to deviation of each input. According to sensitivity investigation, thirteen most significant inputs factors are Time, CR/JP/VN, BUS, TRUCK, MB/TT, 3 W, ST/MT, BCYCLE, PRICSHAW, TT, BCART/HCART, SPRICSHAW, STT (as depicted in Fig. 4). In next stage, neural network was trained and verified with same ANN structure as the best selected ANN model (Model 12) considering only the 13 most important input parameters under the sensitivity investigation. Output of training, cross validation and testing stage of the new sensitivity model are described in Table 3. Table 3 illustrated that the sensitivity model does not perform well compared to the best performing proposed ANN model i.e. model 12 even after suppressing number of input variables from 22 to 13.

Fig. 4
figure 4

Sensitivity analysis of input variables of proposed ANN model

Table 3 Performance analysis between proposed ANN Model and sensitivity model

Results and discussion

Several ANN models have been developed and trained on the data. All these models were trained at different epochs to adjust the weight parameters in the network. Figure 5 illustrates the performance of best 6 models trained on different number of epochs. It can be seen in Fig. 5 that an increase in the number of epochs reduced the MSE and improved the performance of the model both in training and validation phase. It can be seen in Fig. 5 that prediction model M12 achieved the minimum MSE in comparison to other models. Figure 6 shows the regression plot between observed and simulated traffic volume. It can be seen that model M12 achieved the highest prediction accuracy among other models and its R2 value is 0.9919 (higher than other models). R2 can be defined as a coefficient of determination that illustrates how nicely the regression line fitting the data. It is considered that more closed the value of R2 to 1, the better the prediction model would be. Further, the performance of each developed model is evaluated using mean absolute error (MAE), mean absolute percentage error (MAPE), Theils U Statistic (U1, U2), cumulative forecast error (CFE) and variance of absolute percentage error (VAPE). MAE, MAPE, Theils U Statistic (Eqs. 9 and 10) measure the accuracy of prediction and VAPE is used to measure the prediction stability and CFE is used for bias estimation.

Fig. 5
figure 5

MSE vs Epoch curve for best 6 models

Fig. 6
figure 6

Simulated (predicted vehicle frequency) vs observed traffic volume for best 6 models

Table 4 provides the value for all these parameters for different ANN models. Table 4 illustrated that the Model M12 achieved the best score for all parameters in comparison to other developed models.

$$U_{1} = \frac{{\left( {\mathop \sum \nolimits_{i = 1} \left( {P_{i} - A_{i} } \right)^{2} } \right)^{1/2} }}{{\left( {\mathop \sum \nolimits_{i = 1} A_{i}^{2} } \right)^{1/2} }}$$
(9)
$$U_{2} = \frac{{\left( {\frac{1}{n}\mathop \sum \nolimits_{i = 1} \left( {A_{i} - P_{i} } \right)^{2} } \right)^{1/2} }}{{\left( {\frac{1}{n}\mathop \sum \nolimits_{i = 1} A_{i}^{2} )^{1/2} + \left( {\frac{1}{n}\mathop \sum \nolimits_{i = 1} P_{i}^{2} } \right)} \right)^{1/2} }}$$
(10)
Table 4 Statistical indices of different models

In the above equations, A and P denote the changes in actual values and predicted change in values. U1 and U2 is a measure of prediction accuracy and quality respectively. Equations 1115 defines the MAE, MPE, MAPE, VAPE and CFE respectively.

$$MAE = \frac{1}{N}\sum \left| {P_{i} - A_{i} } \right|$$
(11)
$$MPE = \frac{1}{N}\sum \frac{{\left( {A_{i} - P_{i} } \right)^{2} }}{{A_{i} }} \times 100$$
(12)
$$MAPE = \frac{1}{N}\sum \frac{{\left| {(A_{i} - P_{i} )^{2} } \right|}}{{A_{i} }} \times 100$$
(13)
$$VAPE = Var\left( {\frac{{\left| {P_{i} - A_{i} } \right|}}{{A_{i} }}} \right) \times 100$$
(14)
$$CFE = \sum \left( {A_{i} - P_{i} } \right)$$
(15)

Performance comparison of proposed traffic volume prediction model has been done using the approaches used in earlier studies (Table 5). Random forest and regression trees, [30], SVM regression, [31], K-nearest neighbors, [32], Multiple linear regression, [33] have been developed on the dataset. It is found that the BPANN (M12) model can predict traffic volume more accurately than other approaches (as shown in Table 5). It is obvious from the statistical indices that ANN based model is robust and stable, and can be applied successfully for short term traffic flow prediction in Indian traffic conditions. ANN modeling and integrated sensitivity analysis is used in the present study. It is one of the most systematic and accurate way to predict the performance of traffic volume.

Table 5 Performance comparison between proposed ANN based model and traditional models

In this study, 7 nodes have been taken into consideration for hidden layer which provides good results. However, this number is not an optimal selection for any model and can be selected as per the requirements and the size of the data. The back propagation algorithm can be utilized in Advanced Traveller Information Systems (ATIS) because of its efficiency in reducing prediction error. ATIS is a system which makes prediction based on the information stored in the database [34]. More the predictions are accurate, better the suggestions of ATIS would be. Therefore, the thirst for developing more robust and accurate short term prediction model is in demand. Our model achieved the better R2 score (0.9962) among other models for short term traffic flow prediction on two lane undivided highways with mixed traffic conditions. Moreover, if accuracy can be sacrificed a little and processing time is more important, then methods like SVM and conventional regression-b estimators can be a good options rather than going with ANN.

Conclusion

This study presented ANN based short term traffic volume prediction model for undivided two lane highways with mixed traffic in India. The data samples were collected on NH-58 highways stretch from Roorkee to Hardwar. The study used back propagation neural network approach to develop a short term forecasting model for two lane undivided highways with mixed traffic conditions. The major advantage of back-propagation neural network is that it calculates the prediction error and propagates back to the previous layers in order to modify the weights; resulting better prediction accuracy with more training. The training can be stopped after certain number epochs with no further improvements in prediction accuracy. The results of best back propagation model (M12) was quite promising as it achieved a good R2 value of 0.9962. This study can be useful to be used in Advanced Traveller Information Systems for short term traffic predictions.

The study is certainly capable to provide a good solution for short term traffic prediction for two lane undivided highways with mixed traffic conditions in India. But the dataset used for this study is restricted to a limited portion of highway stretch and days of week. The study can be enhanced by using more informative data about traffic flow during weekdays and weekends, for peak hours and normal hours, for different months or seasons. Although this data collection procedure requires a lot of human and technology efforts but it will certainly help in more informative and large dataset. Moreover, this large data set needs to be analyzed with more suitable algorithms i.e. deep neural networks which are known to handle large dataset efficiently. The future scope of this study would be to work on the above mentioned limitations of the study and provide a better solution.