1 Introduction and related work

The World Health Organization (WHO 2018) estimates that about 1.35 million people lose their lives on the roads every year. In the 28 countries of the European Union (EU28 2017) the number of road accident fatalities was 23,500 in 2017. In Italy in the same year, according to the Italian National Institute for Statistics (ISTAT 2017), the deaths and injuries due to road accidents were computed to be 3378 and 246,750, respectively. The magnitude of the number of fatalities and injuries, as well as the associated high social costs due to medical expenses and lost productivity, show that road crashes continue to remain a serious global problem. Therefore, current trends suggest intervening and investing in: (i) technological progress both in vehicles (e.g., increasing electronic control systems on board also to avoid collisions with non-motorized vehicles) and roads (e.g., implementing higher standards of design and maintenance); (ii) more restrictive legislations to mitigate risky behaviours of drivers (e.g., driving under the effect of alcohol and drugs, using mobile phones while driving, exceeding speed limits, unfastening seat-belts); (iii) more advanced safety research (e.g., developing both a greater in-depth knowledge on the methodological frontiers of traditional statistical models used for the analysis of accident data and indicating future directions of statistical modelling). An integrated approach among the three aforementioned issues should be desirable, but unfortunately nowadays this appears still far from being achieved.

In recent years, considerable road safety research efforts have been made for realising a better understanding of the factors that affect the risk of accidents and for modelling the expected crash frequency (number of crashes per year). In this respect, researchers have used several count models of varying complexity, from the classic statistical approach based on the traditional Negative Binomial (NB) regression to more complex models, such as random intercept or correlated parameters ones, or based on different statistical approaches, for instance Bayesian ones. Moreover, multivariate models rather than univariate ones have also been used to model jointly different crash-types. Other statistical models for road crashes are Mannering and Bhat (2014) and Mannering et al. (2016), while an analysis of crash data collected on the Italian roads is presented in Caliendo et al. (2007). However, it is to be stressed that, at the present time, modelling the unobserved heterogeneity in accident data is a major concern for researchers (Aguero-Valverde and Jovanis 2010; Huang and Chin 2010; Mannering and Bhat 2014; Liu and Sharma 2018). Some papers on crashes are based on random parameters models to capture the heterogeneity unexplained by the covariates (e.g., Anastasopoulos and Mannering 2009). Generalized linear models with correlated parameters provide better performance. For example, Yu et al. (2015) show that a correlated parameters Tobit model provides a better fit with respect to the corresponding one assuming uncorrelated parameters. Coruh et al. (2015), and Hou et al. (2018) also find that the correlated random parameters Negative Binomial model is more appropriate than the one with uncorrelated parameters. Similar results are in Caliendo et al. (2019) for a correlated random parameters Poisson model. Multivariate random parameters models provide a further performance increase in terms of goodness-of-fit of road crashes data with different severity levels, see Han et al. (2018), Dong et al. (2014), Guo et al. (2019). However, very few case-studies are available on road crashes with different levels, adopting multivariate models with correlated parameters.

Moreover, most papers are related to accidents occurring on open roads, while accidents occurring in road tunnels have been less investigated. Driving in road tunnels often causes anxiety in drivers because tunnels are dark, narrow, and monotonous (PIARC 2008). As a result, driving behaviour in road tunnels is different when compared to that on open roads. Moreover, since tunnels are enclosed spaces, the escalation of accidents, in terms of consequences might be more serious than on open roads because more users than those directly involved in the accident can be potentially exposed to the risk of collisions. Also the change in light conditions (e.g., the so-called "black hole effect" when entering the tunnel), the disturbing effects on driving due to the tunnel wall (which is too close to the traffic lane), and the excessive stiffness of the tunnel wall (which in absence of safety barriers with re-directive profile is not able to reduce the severity level of the impact when vehicles collide against it) can contribute to more fatalities and/or injuries when compared to those on open roads. An Italian case-study (Caliendo and De Guglielmo 2012) reports 12 severe accidents/108 veh.-km for motorway tunnels in contrast to an average rate of 9 severe accidents/108 veh.-km on the same motorways. Similar results are provided in Zhuang-lin et al. (2009), showing that in China the severity of injuries is higher in freeway tunnels than on open roads. On the other hand, Yeung et al. (2013) report that the traffic accident rates in Singapore expressway tunnels are higher in the transition zones than in the interior zones of tunnels. Therefore, there is evidence that the severity of accidents occurring in tunnels needs further investigations. With reference to Italian tunnels, Caliendo et al. (2013) adopt a bivariate Negative Binomial (NB) model for analysing simultaneously severe and non-severe accidents. To highlight temporal correlations, random effects Negative Binomial (RENB) and Negative Multinomial (NM) models are also developed. Caliendo and Guida (2014) propose a novel bivariate NB model for jointly analysing the severe and total accidents. Both papers apply fixed-effect models with uncorrelated parameters. In Caliendo et al. (2016), instead, univariate random parameters models are proposed both with random effects (i.e., also the intercept was assumed to be random) and without random effects. Also in this case, the parameters are assumed to be uncorrelated. Finally, Caliendo et al. (2019) show that the correlated random parameters model can provide better performance in terms of goodness-of-fit with respect to the corresponding one assuming uncorrelated random parameters, at least for one severity level of crash data.

An alternative to the classical statistical approach for predicting the expected crash frequency may also be the Bayesian method, allowing to include prior information on parameter values in the estimation procedure to obtain a posterior estimate of values of the parameters and some functions thereof. The Bayesian approach has become increasingly popular in the field of traffic safety research for the accessibility of Bayesian models through the use of Markov Chain Monte Carlo (MCMC) methods (Robert and Casella 2010), and the availability of the software package WinBUGS (Spiegelhalter et al. 2005) has encouraged the application of Full Bayesian (FB) methods also in road crashes studies. For example, Dong et al. (2014) employ the FB method to estimate the parameters of a multivariate random parameters zero-inflated regression model. Lee et al. (2015a) adopt the Bayesian Poisson lognormal (PLN) simultaneous equation spatial error model (BPLSESEM) to identify the significant factors for two target variables (pedestrian crash per crash location, and crash-involved pedestrians per residence). Lee et al. (2015b) develop a multivariate Poisson lognormal conditional autoregressive model, at the macroscopic level, for crashes by different transportation modes such as motor vehicle, bicycle, and pedestrian crashes. Farid et al. (2017) investigate the applicability of informative priors in Bayesian statistics to increase the transferability of safety performance functions. Liu and Sharma (2018) employ a multivariate spatial–temporal Bayesian model for handling the unobserved heterogeneity across space, time, and crash-types. Han et al. (2018) propose a Bayesian hierarchical random parameter approach to investigate the variations on crash frequency across different regions. Afghari et al. (2019) investigate the effects of different types of informative priors using the Bayesian inference. Guo et al. (2019) employ the FB method to develop a Poisson lognormal model accounting for both correlations between crash-types and unobserved heterogeneity across observations. More recently, Huang et al. (2019) implement a Bayesian multivariate random parameters model with mixture components for spatially correlated data. Instead, Zeng et al. (2019) investigate the inclusion of spatio-temporal correlation and interaction in a multivariate random parameters Tobit model, and their influence on fitting a real crash rates with different severity outcomes. Park et al. (2020) develop models incorporating the dependence among the multivariate crash counts by modelling multivariate random effects using copulas.

However, also the Bayesian approach has been employed for analysing the unobserved heterogeneity and/or correlations among crash-types with reference prevalently to those occurring on open roads, while crashes happening in tunnels have scarcely been considered. In this respect, it is to be said that tunnels might not have so much unobserved heterogeneity as open roads because the tunnels are closed structures with confined space where some factors that can cause accidents and/or contribute to unobserved heterogeneity are indeed controlled (e.g., there is absence of variations in the weather conditions within tunnels, except in the entrance and exit zones). Furthermore, no difference exists between night and day because lighting conditions are constant in tunnels (except at the portals in daylight). Moreover, in general there are no hazardous points in tunnels, such as junctions and/or intersections, that might cause sudden variations in driving behaviour in response to their presence.

Given the limited body of research dedicated hitherto to the unobserved heterogeneity in tunnels, the goal of this paper is to apply the Bayesian approach to investigating this issue in greater depth, by assuming a PLN model for crashes. We consider also that crash counts could be correlated across collision types because of common influencing factors and unobserved effects. To account for the possible correlations among parameters, a multivariate PLN model is adopted.

In details, we study road crashes of two different types, namely severe and non-severe, occurred in Italian road tunnels, by adopting a hierarchical bivariate PLN model and a Bayesian procedure to estimate its parameters. Possible correlations between parameters are also taken into account. It is worth noting that more recent papers (e.g., Guo et al. 2019; Huang et al. 2019) propose bivariate PLN models in which only some parameters are correlated, namely those corresponding to the same covariate but pertaining to different severity levels of crashes, while the others are still assumed uncorrelated.

The paper is organized as follows: the next section describes the data set under analysis. Subsequently, we present the proposed bivariate model. Then the results of statistical analysis are commented on and compared with the outcomes obtained by a simpler procedure that is based on an alternative method exploiting univariate PLN models, in which the random parameters are assumed to be correlated each other. Finally, some conclusions are provided, together with some hints for future research.

2 Data description

The database of crashes collects data from \(J\) \(=252\) Italian motorway tunnels (226 two-lane and 26 three-lane) with unidirectional traffic only over a monitoring period of \(T=\) \(4\) years (2006–2009), resulting in \(N = J \cdot T = 1008\) observations for both severe and non-severe crashes.

Accidents involve only motorized vehicles, since pedestrians and bicycles are forbidden in motorway tunnels. Table 1 gives accident counts observed during the monitoring period. In this study, 2280 accidents are considered, 757 of which are severe accidents. In two-lane tunnels, 1930 accidents are counted (665 of which are severe accidents), while in three-lane tunnels 350 accidents are registered (92 of which are severe crashes). A decreasing trend in the aforementioned three crash-types over time is also observed, as reported in Table 1.

Table 1 Accident count data observed during the 4-year monitoring period

The length of the tunnels varies from 0.39 to 3.25 km. Unidirectional traffic flow is expressed in terms of the average annual daily traffic AADT per lane, and it ranges between 2250 and 20,380 vehicles/day per lane. The percentage of trucks (i.e., vehicles with weight ≥ 30 kN) ranges from 15.0 to 31.2%. The width of each lane is 3.75 m, and the presence of a sidewalk (with a width of about 1.0 m) was observed only in 143 two-lane tunnels. In all the tunnels investigated the emergency lane was absent. The longitudinal slope is variable in 90 tunnels (83 two-lane and 7 three-lane) with positive and negative gradients, while it is constant in the remaining 162 tunnels (143 two-lane and 19 three-lane). The mechanical ventilation system, which is longitudinal with jet-fans, was observed in 55 two-lane tunnels and absent in the remaining 197 tunnels.

It is worth noting that the number of crashes reduces with time between 2006 and 2009, although the average annual daily traffic per lane and the percentage of trucks are stable over time. This outcome is likely to be due to: (i) the increasing use of electronic speed control systems (tutors) on Italian motorways containing the investigated tunnels (e.g., the presence of the tutor might affect driving behaviour making a reduction of speeds possible); (ii) the introduction in 2003 of the driver's license with a demerit points system for violations of the Highway Code (e.g., the point penalties on driving license due to violations of traffic law might also affect driving behaviour and a reduction in accidents might be expected); (iii) the positive effects of the implementation in Italy of the European Directive 2004/54/EC after October 2006, which led to the birth and/or reinforcement of certain safety measures in tunnels.

The database used in the present paper is an extended version of the one analyzed in Caliendo et al. (2019), by the inclusion of the three-lane tunnels data.

2.1 Set of variables

The dependent variables analyzed in the present work are the number of severe and non-severe accidents (property damage only) per year (crash frequency).

The independent variables considered are the most significant for road engineers: the average annual daily traffic AADT per lane, the length L, the percentage of trucks %Tr, the sidewalk SW, the longitudinal slope LS, the mechanical ventilation MV, and the number of lanes LN. The variables SW, LS, MV, and LN represent qualitative predictors coded as dummy variables: SW = 1 if a sidewalk is present, 0 otherwise; LS = 1 if the longitudinal slope of tunnel varies with ascending and descending gradient, 0 otherwise; MV = 1 if there is the mechanical ventilation, 0 otherwise; LN = 1 if there are three lanes, 0 otherwise. Summary statistics of tunnel length, AADT per lane and percentage of trucks are given in Table 2.

Table 2 Summary statistics of variables

In order to evaluate the association between pairs of variables, we compute the correlation matrix in Table 3, by using Pearson’s correlation coefficient being all the reported variables either real-valued or dichotomous. It is evident that all the independent variables reported in Table 3 are weakly correlated to each other.

Table 3 Correlation matrix of the predictors AADT, L, %Tr, SW, LS, MV, and LN

Aiming at showing also the year effect on the occurrence of severe and non-severe accidents in tunnels within the monitoring period 2006–2009, the year variable is also considered as a predictor. The year variable is coded by introducing the dummy variables Y2007, Y2008, and Y2009, after assuming year 2006 as baseline. We notice that AADT and %Tr do not show a temporal trend, and the other regressors do not change over time (during the monitoring period).

All the \(K = 10\) explanatory variables mentioned above are used to build the road tunnel crash models presented in the next Sections, where independent variables AADT, %Tr and L are considered in logarithmic (base 10) form, in line with the technical literature on tunnel crashes (Caliendo et al. 2016).

3 Modelling approach

Crash data are discrete, non-negative random count data, that are commonly modelled by the Poisson distribution, for which the variance is equal to the mean. This relationship is often violated in crash counts, that shows an empirical variance greater than the sample mean (overdispersion), as a consequence of the strength of some influencing factors. To account for overdispersion, the Poisson lognormal (PLN) model has been proposed in many recent studies to model the variation of the average, by replacing traditional negative binomial (Poisson Gamma) approaches, as detailed in Sect. 1.

As already remarked, the number of severe and non-severe crashes occurring in the same tunnel in different years can not be considered as stochastically independent, since they share the effects of countermeasures implemented in that specific road tunnel over time. More precisely, crashes represent a cluster of observations that share common, even if unknown, random effects. As highlighted in Sect. 1, also the influence of explanatory variables on accidents might vary from tunnel to tunnel, i.e. some heterogeneity on risk factors might be present across tunnels. To cope with this heterogeneity, all the parameters of the model can vary from tunnel to tunnel, and they can be correlated to each other due to the interplay of the risk factors.

A way to account for data-level variability and tunnel-level one is to use a hierarchical (often named multilevel) model, where the first level models the overdispersed crash counts and the second level accounts for the variability of parameters across tunnels. Some applications of hierarchical approaches can be found in Furlan (2008), Arima et al. (2012), Han et al. (2018). In order to jointly model all the crash-types, we focus on a multivariate model.

For the aforementioned reasons, a hierarchical multivariate Poisson lognormal model with correlated random parameters is developed in our work for jointly analyzing \(M\) different crash levels. It is applied to the available dataset containing \(M = 2\) different crashes (severe and non-severe) but can be easily generalized to any \(M\) value.

The inferential procedure for the proposed model must estimate a large number of parameters, as detailed in the next Section. To cope with so a complex model, Bayesian inference is preferred (Gelman and Hill 2007), since it provides a reasonable trade-off between computational burden and ease of implementation. Multilevel/hierarchical Bayesian models can be profitably estimated by the MCMC algorithms implemented in many software tools, such as WinBUG or OpenBUG that are also freely available. This feature, together with the possibility to include prior information of experts (if available), lead us to choose a Bayesian approach to the proposed road tunnel crash model.

3.1 Model description

In this Section, we define the proposed Bayesian hierarchical multivariate Poisson lognormal (regression) model with correlated random parameters, by specifying each level.

Level 1: observed crash counts mode

Let \({\varvec{Y}}_{i} = \left( {Y_{i}^{1} , \ldots ,Y_{i}^{M} } \right)\) denote observation vector \(i\) \(\left( {i = 1, \ldots ,N} \right)\) containing the crash counts \(Y_{i}^{m}\) for collision severity \(m\), \(\forall m \in \left\{ {1, \ldots ,{ }M} \right\}\), in the same year, where \(N\) is still the number of observations for both severe and non-severe crashes (see Sect. 2). The road tunnel \(j\) corresponding to observation vector \(i\) is denoted henceforth as \(j\left[ i \right]\).

For a given crash-type \(m\), at the first level of the hierarchy, conditional on the mean \(\lambda_{i}^{m}\), vector element \(Y_{i}^{m}\) of observation vector \(i\) is modelled as:

$$Y_{i}^{m} |\lambda_{i}^{m} { }\sim{ }Poisson\left( {\lambda_{i}^{m} } \right)$$
(1)

and \(\lambda_{i}^{m}\) is given by the following link function:

$${\text{log}}\left( {\lambda_{i}^{m} } \right) = { }\beta_{0}^{m} + \mathop \sum \limits_{k = 1}^{K} \beta_{kj\left[ i \right]}^{m} X_{ik} + \varepsilon_{i}^{m}$$
(2)

Parameter \(\beta_{0}^{m}\) is the intercept of the model for crash-type \(m\), \(K\) is still the number of covariates, while \({\varvec{\beta}}_{j\left[ i \right]}^{m} = \left( {\beta_{1j\left[ i \right]}^{m} , \ldots ,\beta_{Kj\left[ i \right]}^{m} } \right)^{T}\) is the \(K\)-dimensional random vector containing the (possibly) correlated random parameters in road tunnel \(j\left[ i \right]\) for crash-type \(m\). \(X_{ik}\) is the \(\left( {i,k} \right)\) entry of the \(N \times K\) design matrix \({\mathbf{X}}\) containing all the explanatory variables data; \({\mathbf{X}}\) depends on traffic, road tunnel characteristics and year and does not vary across crash-types. \(\varepsilon_{i}^{m}\) is the so-called “random effect” accounting for overdispersion in crash count data for crash-type \(m\) and is an element of the random vector \({\varvec{\varepsilon}}_{i} = \left( {\varepsilon_{i}^{1} , \ldots ,\varepsilon_{i}^{M} } \right)^{T}\), assumed as independent of \({\varvec{\beta}}_{j\left[ i \right]}^{m}\), that is modelled as a multivariate normal (MVN) with mean vector \({\bf 0}\) (an \(M\)-dimensional zero vector) and \(M \times M\) variance–covariance matrix \({{\varvec{\Sigma}}}_{{\varvec{\varepsilon}}}\):

$${\varvec{\varepsilon}}_{i} | {{\varvec{\Sigma}}}_{{\varvec{\varepsilon}}} \sim MVN \left( {{\bf 0}, {{\varvec{\Sigma}}}_{{\varvec{\varepsilon}}} } \right)$$
(3)

Level 2: tunnel-level model

In order to model tunnel-level variability, we assume that all the regression coefficients related to the \(K\) covariates for the \(M\) different crash-types under examination, whose number is \(D = M \cdot K\), could be correlated to each other.

Thus, the distribution of vector \({\varvec{\beta}}_{j} = \left( {\beta_{1j}^{1} , \ldots ,\beta_{Kj}^{1} , \ldots ,\beta_{1j}^{M} , \ldots ,\beta_{Kj}^{M} } \right)^{T}\), containing all those random coefficients, is assumed to be an MVN, viz.

$${\varvec{\beta}}_{j} | {\varvec{\mu}},{{\varvec{\Sigma}}} \sim MVN \left( {{\varvec{\mu}},{{\varvec{\Sigma}}}} \right) \;\;\;\;\;\; j = 1, \ldots ,J$$
(4)

where \({\varvec{\mu}} = \left( {\mu_{1}^{1} , \ldots ,\mu_{K}^{1} , \ldots ,\mu_{1}^{M} , \ldots ,\mu_{K}^{M} } \right)^{T}\) is a \(D\)-dimensional vector containing all the expected values of those random coefficients, i.e. \(\mu_{k}^{m} = {\mathbb{E}}\left( {\beta_{k}^{m} } \right)\) (\(m = 1, \ldots , M\), \(k = 1, \ldots ,K\)), while \({{\varvec{\Sigma}}}\) is the \(D \times D\) variance–covariance matrix of those random coefficients.

Level 3: priors

According to the Full Bayesian (FB) paradigm, it is necessary to specify the prior distributions for the intercept vector \({\varvec{\beta}}_{0} = \left( {\beta_{0}^{1} , \ldots ,\beta_{0}^{M} } \right)^{T}\) and for hyperparameters within the models (1)-(4) to obtain Bayesian point and interval estimates of coefficients. Prior distributions reflect the prior belief of an expert about the parameters to be estimated. In the presence of sufficient prior knowledge for the individual parameters of interest, informative prior distributions are usually specified (Bedrick et al. 1996; Schluter et al. 1997). However, when prior knowledge is not available, it is customary to define the so-called non-informative (diffuse) priors expressing vague information on variables. In this study, the following non-informative priors are used for parameters and hyperparameters, in line with those commonly adopted in road accident technical literature.

A diffuse prior distribution is assumed for the intercept vector \({\varvec{\beta}}_{0}\) that is modelled to have an MVN distribution:

$${\varvec{\beta}}_{0} |{\bf 0}, {{\varvec{\Sigma}}}_{0} \sim MVN \left( {{\bf 0}, {{\varvec{\Sigma}}}_{0} } \right)$$
(5)

with an \(M\)-dimensional mean vector \({\bf 0}\) and an \(M \times M\) variance–covariance diagonal matrix \({{\varvec{\Sigma}}}_{0}\) with large variance values, namely \({{\varvec{\Sigma}}}_{0} = v_{0} {\mathbf{I}}_{M}\), where \(v_{0} = 1000\) and \({\mathbf{I}}_{M}\) is the \(M \times M\) identity matrix.

A non-informative conjugate prior for variance–covariance matrix \({{\varvec{\Sigma}}}_{{\varvec{\varepsilon}}}\) in (3) is assumed (Spiegelhalter et al. 1996): an inverse Wishart prior distribution \(IW\left( {r_{\varepsilon } \cdot {\mathbf{P}}_{{\varvec{\varepsilon}}} ,{ }r_{\varepsilon } } \right)\) is specified for \({{\varvec{\Sigma}}}_{{\varvec{\varepsilon}}}\), where \({\mathbf{P}}_{{\varvec{\varepsilon}}}\) is an \(M \times M\) matrix representing a hint for variance–covariance matrix while \(r_{\varepsilon }\) is a scalar “degrees of freedom” parameter (Carlin and Louis 2000). Typically, \(r_{\varepsilon }\) is an integer \(r_{\varepsilon } \ge M\); the smaller \(r_{\varepsilon }\), the more diffuse the prior distribution of the elements of \({\mathbf{P}}_{{\varvec{\varepsilon}}}\). For \(M = 2\) (severe and non-severe crash-types), in road crashes literature (Aguero-Valverde 2013; Huang et al. 2019) common choices are \(r_{\varepsilon } = M = 2\) degrees of freedom and

$${\mathbf{P}}_{{\varvec{\varepsilon}}} = \left[ {\begin{array}{*{20}c} {0.1} & {0.005} \\ {0.005} & {0.1} \\ \end{array} } \right]$$
(6)

since small values of variances and correlations are usually obtained to model overdispersion in accident data. Accordingly, we select the same values for the inverse Wishart prior of \({{\varvec{\Sigma}}}_{{\varvec{\varepsilon}}}\) in our Bayesian inferential procedure, namely \(r_{\varepsilon } = 2\) and \({\mathbf{P}}_{{\varvec{\varepsilon}}}\) as in (6).

Also for the parameters of the regression coefficients vector \({\varvec{\beta}}_{j}\) in (4), a diffuse prior is specified. The prior distribution of the hyperparameter vector \({\varvec{\mu}}\) in (4) is assumed to be an \(MVN\left( {\bf 0},{{{\varvec{\Sigma}}}_{{\varvec{\mu}}} } \right)\), with a \(D\)-dimensional mean vector equal to a zero vector, and \({{\varvec{\Sigma}}}_{{\varvec{\mu}}}\) being a diagonal \(D \times D\) variance–covariance matrix with large variance values, namely \({{\varvec{\Sigma}}}_{{\varvec{\mu}}} = v_{{\varvec{\mu}}} {\mathbf{I}}_{D}\), where \(v_{{\varvec{\mu}}} = 1000\).

By following the arguments discussed for the selection of the prior of \({{\varvec{\Sigma}}}_{{\varvec{\varepsilon}}}\), we adopt for \({{\varvec{\Sigma}}}\) in (4) a non-informative conjugate prior by specifying an inverse Wishart prior distribution \(IW\left( {r \cdot {\mathbf{P}},{ }r} \right)\) for \({{\varvec{\Sigma}}}\), where \(r = D\) guarantees a diffuse prior and \({\mathbf{P}}\) is the \(D \times D\) matrix:

$${\mathbf{P}} = \left[ {\begin{array}{*{20}c} {0.1} & \cdots & {0.005} \\ \vdots & \ddots & \vdots \\ {0.005} & \cdots & {0.1} \\ \end{array} } \right]$$

with low variance values and correlations.

3.2 Bayesian estimation procedure

In line with most FB implementations, the posterior distributions were obtained using Markov Chain Monte Carlo (MCMC) sampling techniques available in OpenBUGS (Lunn et al. 2009), which is an open-source statistical software that implements different families of MCMC algorithms, such as Gibbs, Metropolis and slice sampling. For our model, the Gibbs sampler is adopted for the good convergence characteristics and reduced execution times; the MCMC algorithm provides a sample of posterior vectors (containing all the parameters values) that enable us to compute point and interval estimates of each parameter in the model.

4 Model checking and comparison

After model estimation, model fitting and predicting capabilities have to be checked. Cross-validation is considered one of the most useful tools for evaluating model fitting and test error. However, in Bayesian framework the approach to implement cross-validation must be carefully selected. Indeed, leave-one-out cross-validation (where the model must be re-fit a number of times equal to the cardinality of the sample) can be really highly computationally expensive, because even a single Bayesian model fitting procedure could require a long MCMC computation.

In order to mitigate this problem, we adopt a \(k\)-fold cross-validation procedure with \(k=5\), a value guaranteeing a good bias-variance tradeoff (Hastie et al. 2009). The \(k\)-fold cross-validation is performed by the data set into \(k\) groups (folds), where \(k-1\) are used to fit the model and the remaining one, say group \(l\), is treated as a validation set, by which we compute the Root Mean Squared Error (RMSE) of the model as a validation statistic, viz.

$$RMSE_{l} = \sqrt {\frac{1}{{n_{l} }}\mathop \sum \limits_{i} \frac{1}{M}\mathop \sum \limits_{m = 1}^{M} \left( {Y_{i}^{m} - \hat{Y}_{i}^{m} { }} \right)^{2} }$$
(7)

where \(Y_{i}^{m}\) is the crash count \(i\) of the accidents of severity level \(m\) (here \(M = 2\), namely severe or non-severe) in data subset \(l\), \(\hat{Y}_{i}^{m}\) denotes the value predicted by the model for severity level \(m\) in the same subset \(l\), and \(n_{l}\) is the sample size of the group \(l\). The procedure is repeated \(k\) times; a different group \(l = 1, \ldots ,k\) is selected as a validation set each time, and an \(RMSE_{l}\) is obtained. The \(k\)-fold cross-validation RMSE estimate is then given by:

$$RMSE = CV_{\left( k \right)} = \frac{1}{k}\mathop \sum \limits_{l = 1}^{k} RMSE_{l}$$
(8)

In general, models with a small RMSE values shows good predicting capabilities.

We use Deviance Information Criterion (DIC), introduced by Spiegelhalter et al. (2002) and implemented in OpenBUGS, to compare the proposed Bayesian hierarchical multivariate PLN with simpler independent hierarchical univariate PLN models that fit single crash-types data, that do not exploit the interdependencies among counts of different severity levels occurring in the same road tunnels. We adopt Bayesian estimation also for those univariate models.

4.1 Hierarchical univariate PLN crash count models with correlated random parameters

Hierarchical univariate PLN crash count models with correlated random parameters, one for each severity level, can be easily derived by considering model presented in Sect. 3.1, where counts \(Y_{i}^{m}\) are independent of \(Y_{i}^{\ell }\) for each severity level \(\ell \ne m\). In order to reach this goal, we assume that \({\varvec{\beta}}_{j}^{m} = \left( {\beta_{1j}^{m} , \ldots ,\beta_{Kj}^{m} } \right)^{T}\) is independent of \({\varvec{\beta}}_{j}^{\ell }\) \(\forall \ell \ne m\), and \(\varepsilon_{i}^{m}\) is independent of \(\varepsilon_{i}^{\ell } \;\; \forall \ell \ne m\).

In the first level, \(Y_{i}^{m}\) (\(i = 1, \ldots ,N\)) for each severity level \(m = 1, \ldots ,M\) is modelled according to (1) and (2)\(,\) but \(\varepsilon_{i}^{m}\) (still assumed independent of \({\varvec{\beta}}_{j\left[ i \right]}^{m}\)) is modelled as:

$$\varepsilon_{i}^{m} |{ }\sigma_{\varepsilon }^{2} { }\sim{ }N\left( {0,{ }\sigma_{\varepsilon }^{2} } \right)$$
(9)

By collecting eqs. (9) \(\forall m = 1, \ldots ,M\), we retrieve (3) in which \({{\varvec{\Sigma}}}_{{\varvec{\varepsilon}}} = \sigma_{\varepsilon }^{2} {\mathbf{I}}_{M}\).

In level 2, since only coefficients pertaining to the same severity level can be correlated, the distribution of \({\varvec{\beta}}_{j}^{m}\) is given by:

$${\varvec{\beta}}_{j}^{m} | {\varvec{\mu}}^{m} ,{{\varvec{\Sigma}}}^{m} \sim MVN \left( {{\varvec{\mu}}^{m} ,{{\varvec{\Sigma}}}^{m} } \right) \;\;\;\;\;\; j = 1, \ldots ,J,\;m = 1, \ldots ,M$$
(10)

where \({\varvec{\mu}}^{m} = \left( {\mu_{1}^{m} , \ldots ,\mu_{K}^{m} } \right)^{T}\) is a \(K\)-dimensional vector containing the expected values of random coefficients for crash level \(m\), while \({{\varvec{\Sigma}}}^{m}\) is the \(K \times K\) variance–covariance matrix. Again, by collecting eqs. (10) for \(\forall m = 1, \ldots ,M\), we obtain (4) in which \({{\varvec{\Sigma}}} = {\text{diag}}\left( {{{\varvec{\Sigma}}}^{1} , \ldots ,{{\varvec{\Sigma}}}^{M} } \right)\).

Regarding the prior distributions, \(\beta_{0}^{m}\) in (2) is a diffuse prior following, independently of any other parameter, a normal distribution with zero mean and large variance, \(\forall m = 1, \ldots ,M\). More precisely, prior distribution is:

$$\beta_{0}^{m} { }\sim{ }N\left( {0,{\upnu }_{0} } \right)$$
(11)

where \({\upnu }_{0} = 1000\). An improper vague prior is assumed for \(\sigma_{\varepsilon }^{2}\) in (9), i.e. \(p\left( {\sigma_{\varepsilon }^{2} { }} \right) \propto 1/\sigma_{\varepsilon }^{2}\). The prior distribution of the hyperparameter \({\varvec{\mu}}^{m}\) in (10) is assumed to be an \(MVN\left( {\bf0},{{\varvec{\Psi}}} \right)\) \(\forall m = 1, \ldots ,M\), with a mean vector equal to a \(K\)-dimensional zero vector and \({{\varvec{\Psi}}}\) being a diagonal \(K \times K\) variance–covariance matrix with large variance values, namely \({{\varvec{\Psi}}} = v_{m} {\mathbf{I}}_{K}\), where \(v_{m} = 1000\), like \({\varvec{\mu}}\) in (4). One and the same non-informative conjugate prior is specified for \({{\varvec{\Sigma}}}^{m} { }\) in (10) \(\forall m = 1, \ldots ,M\), similar to the prior of \({{\varvec{\Sigma}}}\) in (4): the prior distribution is an inverse Wishart \(IW\left( {s \cdot {\mathbf{Q}},{ }s} \right)\), where \(s = K\) guarantees a diffuse prior and \({\mathbf{Q}}\) is the \(K \times K\) matrix:

$${\mathbf{Q}} = \left[ {\begin{array}{*{20}c} {0.1} & \cdots & {0.005} \\ \vdots & \ddots & \vdots \\ {0.005} & \cdots & {0.1} \\ \end{array} } \right]$$

5 Analysis of results

We implemented the model presented in Sect. 3 in OpenBUGS, for the dataset discussed in Sect. 2, containing crashes with \(M = 2\) severity levels (severe and non-severe), thus our model turns out to be a Bayesian hierarchical bivariate Poisson lognormal model with correlated random parameters.

We verified MCMC convergence by launching two separate chains with different starting values, where each chain has 2,000,000 iterations. The first 200,000 iterations were discarded as burn-in period and a thinning interval equal to 20 was applied to guarantee negligible correlations between consecutive values of the MCMC process, thus we obtain a posterior sample of size \(n = 90,000\). Convergence to the stationary posterior distribution was assessed using the Brooks-Gelman-Rubin (BGR) statistic (Brooks and Gelman 1998), being its value less than 1.1. Convergence was also verified by visual inspection of the MCMC trace plots of all model parameters, as well as by evaluating their autocorrelation functions.

The sample of posterior vectors (containing parameters values) generated by the MCMC algorithm allows to compute the point and interval estimate for each parameter. The former is the posterior mean of each parameter (the expected value on the posterior distribution), for instance for parameter \(\mu_{k}^{m}\) is given by \({\mathbb{E}}[\mu_{k}^{m} |data] \cong 1/n\mathop \sum \limits_{u = 1}^{n} \left( {\mu_{k}^{m} } \right)_{u}\), computed by the sample mean of the corresponding elements \(u = 1, \ldots ,n\) of the posterior sample; the latter is the 90% equal-tailed credible interval of the parameter and is given by the interval between the 5th and 95th percentile of the posterior sample.

Since we are interested in tunnel-level behavior of regression parameters, we estimate the posterior mean of \(\mu_{k}^{m}\) (for each element of the vector \({\varvec{\beta}}_{j}\)) and its 90% equal-tailed credible interval. We also compute the point estimate of the standard deviation of the regression parameters, given by the posterior mean of the square root of the diagonal elements of \({{\varvec{\Sigma}}}\) in (4), to evaluate their variability among tunnels.

Table 4 contains, for severe and non-severe crashes, posterior estimates of the mean and of the standard deviation of all the regression coefficients and the intercept, together with the 90% equal-tailed credible interval of the mean.

Table 4 Estimation results of the bivariate Poisson lognormal model with correlated random parameters

In details, the coefficients of AADT per lane are 3.317 and 2.657 for severe and non-severe crashes, respectively; in addition, the coefficients of L are 0.9256 for severe accidents and 1.709 for non-severe ones. The expected frequency of severe (or non-severe) crashes increases with the average annual daily traffic AADT per lane, the percentage of trucks %Tr, and the tunnel length L. The most likely explanations are the following: as AADT per lane and %Tr increase, the crash frequency is expected to increase since for keeping the desired speeds, the lane changing and overtaking movements of vehicles are more frequent; by increasing L, the crash frequency likely increases for a progressive reduction of the drivers' concentration in long tunnels.

Table 4 shows also that the expected frequency of severe accidents decreases with the presence of the sidewalk SW in the tunnel, due to a reduction in disturbing effects on driving behavior that might be caused by the tunnel wall when it is too close to the traffic lane. The modelling results confirmed also, in accordance with the crash data, a reduction in crash frequency (for both crash-types) over years from 2006 to 2009. This decrease may be due to the positive effects of more stringent road-safety policies over time; as well as to the implementation of some countermeasures in the tunnels after October 2006, i.e. the date of the entry into force in Italy of the European Directive 2004/54/EC on minimum safety requirements for tunnels (European Parliament and Council 2004). When the longitudinal slope of tunnel is not constant (LS = 1), different signs of the estimation points was obtained in relation to the severity of crashes (i.e., LS presents a positive sign for the severe accidents and negative for the non-severe ones). However, we comment on the severe accidents only, since for this crash-type the covariate LS is statically significant. One possible explanation might be that the non-constant slope has a minor effect on the reduction in the trucks’ average speed (i.e., the average speed of trucks might be more or less the same as in flat tunnels) when compared to upgrade (the grade resistance force hinders the truck motion) or downgrade (trucks slow down to prevent brake failure) tunnels. This means that the average speed of trucks continues to remain quite high. Also the speed of passenger cars is quite high, since this is not influenced by variations of the longitudinal slope due to the greater ratio between the engine power and vehicle weight compared to that of truck. As a result, an increase in the severity of consequences of the collisions between vehicles (due to high speed) and/or against the tunnel wall might be expected. Also LN regressor was found to be statistically significant for both crash-types, with opposite influence on severe and non-severe accidents. The addition of a lane (e.g., passing from two to three lanes) is generally motivated by the need to obtain better traffic-flow conditions. In these cases, drivers are not generally forced to increase their speed to compensate for the lost time and a decrease in the number of severe crashes (prevalently associated with excessive speeds) is expected. On the other hand, by increasing the number of lanes, drivers have more opportunities for lane change and consequently there might be more traffic conflicts, resulting in an increase in the number of accidents (even if it might cause material damage only).

By looking at standard deviations of the regression parameters with a significant influence on tunnel crashes, some of them are comparable to mean values showing a considerable variability among tunnels. Such parameters are those related to L, SW, LS and LN for severe crashes, while a smaller variability is noticed for parameters related to non-severe crashes.

Additional considerations can be offered by inspecting the correlation matrix of regression coefficients of bivariate PLN model in Table 5, that reports only the main diagonal blocks, being the most significant ones to be analyzed from engineering perspective: some off-diagonal coefficients are reported in italic in case of (i) non-negligible correlation (greater than 0.3) between parameters; (ii) pertaining to covariates showing a relevant influence on crashes, namely having a 90% credible interval on the mean not containing the value 0 (see Table 4). With reference to severe accidents that are prevalently more relevant for causing victims, one may note from the content of Table 5, for example, for some statistically significant covariates that: (i) the non-constant slope (LS = 1) is negatively correlated to the length L (-0.3061), which means that it might mitigate the effects of the L on increasing the frequency of this crash-type; (ii) LS presents, instead, a moderate positive correlation with SW (0.5317), and also LN has a moderate positive correlation with SW (0.4060), suggesting further investigations on the presence of sidewalks in tunnels with non-constant slope or three lanes.

Table 5 Main diagonal blocks of the correlation matrix of regression coefficients of bivariate PLN model

5.1 Comparison between bivariate PLN model and two univariate PLN models

The proposed bivariate PLN model is more complex with respect to PLN models that does not exploit the dependencies among coefficients and crash count of different severity levels, resulting in longer execution times of the MCMC algorithm as a consequence of a greater number of parameters to estimate. However, the bivariate PLN model offers some key advantages, such as better fits to data and reduced prediction error.

PLN models with correlated parameters provides better performance with respect to those assuming independent regression parameters (Anastasopoulos and Mannering 2011). However, it is not clear if the proposed bivariate hierarchical PLN model shows some advantages, in terms of goodness-of-fit and prediction capability, with respect to two independent univariate hierarchical PLN models (see Sect. 4.1) for severe and non-severe accidents with correlated parameters.

In order to analyze tunnel-level behavior of regression parameters, in Table 6 we present the posterior mean of \({\mu }_{k}^{m}\) and its 90% equal-tailed credible interval for each parameter computed by the two independent univariate PLN models for severe and non-severe crashes. We also provide the point estimate of the standard deviation of the regression parameters, given by the posterior mean of the square root of the diagonal elements of \({{\varvec{\Sigma}}}^{m}\) in (10). Table 6 also contains, for both crash-types, the posterior estimates of the mean and of the standard deviation of the intercept \(\beta_{0}^{m}\), together with the 90% equal-tailed credible interval of the mean.

Table 6 Estimation results of independent univariate Poisson lognormal with correlated random parameters

By analyzing standard deviations and means of the regression parameters computed by univariate PLN models, a smaller variability of parameters values among tunnels seems to be present with respect to the variability resulting from the estimates obtained by the bivariate PLN model, being the standard deviation value much less than mean value for all the parameters.

The comparison of Tables 4 and 6 shows that the proposed bivariate model is able to highlight the importance of the presence of the sidewalk to reduce severe accidents (posterior mean significantly less than 0), as expected by road engineers. On the contrary, the univariate models for both crash-types counts are not able to show that the presence of the sidewalk is relevant to reduce accidents, since both SW credible intervals contain the 0 value. This outcome shows that our model can reveal also this weak effect that univariate models can not detect.

Moreover, the DIC value of the bivariate model reported in Tables 4 is 4638, which is much lower than that obtained by the use of two independent univariate PLN models (4725, see in Table 6). The latter DIC value has been computed by summing DIC values of single univariate PLN models, since DIC is additive in the case of independent models and prior distributions (El-Basyouny and Sayed 2009). Therefore, our model provides much better goodness-of-fit performance with respect to the use of two independent PLN models, by showing a drop-off of 87 to the DIC value.

Furthermore, by comparing the fivefold cross-validation RMSE values reported in Tables 4 and 6, it is immediate to notice that the bivariate PLN model has a smaller RMSE value than those provided by both the univariates models (the former has an RMSE = 1.454, while the latter models have RMSE = 1.572 and RMSE = 2.116 for severe and non-severe crashes, respectively), showing that the proposed model has also better prediction performance with respect to single independent univariate PLN models with correlated random parameters.

6 Conclusions

This work proposes a Bayesian bivariate Poisson lognormal (PLN) hierarchical model with correlated parameters for the joint analysis of severe (fatality and injury only accidents) and non-severe crashes (property damage only) occurred in Italian road tunnels.

The proposed model provides smaller RMSE and DIC values than those obtained by the use of two independent univariate PLN models with correlated random parameters.

It has been highlighted that the frequency of severe and non-severe crashes increases according to the annual daily traffic AADT per lane, the tunnel length L, and the percentage of trucks %Tr. This might be attributed to the fact that as AADT and %Tr increase, the changing of lane and overtaking movements are more frequent, so that crashes are expected to increase; by increasing L, crashes are also expected to increase for a decrease in the drivers' concentrations. The presence of the sidewalk SW, instead, reduces accidents because it mitigates the disturbing effects on driving behavior caused by the tunnel wall. The importance of the presence of the sidewalk is not highlighted by adopting independent univariate PLN models, while our bivariate PLN model reveals that severe accidents can be reduced by introducing a sidewalk and, thus, can offer better insight into crash data collected in tunnel roads.

The modelling results confirmed also, in accordance with the crash data, a reduction in crash frequency over the years from 2006 to 2009. The correlation matrix of the regression parameters shows that the non-constant slope is negatively correlated to the tunnel length, which means that it might mitigate the effects of the L on increasing the frequency of severe crashes. The LS and LN present, instead, a positive correlation with SW.

The proposed model allows road engineers to identify the variables on which prevalently to focus data collection, as well as to make more appropriate proposals in order to reduce the probability of accidents in tunnels. Accidents can be lowered by designing tunnels not too long, whose longitudinal slope is possibly non-constant, and by introducing a sidewalk. It is worth remarking that the presence of the sidewalk plays a dual role: (i) it reduces the disturbing effects on driving behaviour caused by the tunnel wall when it is too close to the traffic lane; (ii) the sidewalk can also be used as an escape route by tunnel users during their evacuation process following a disruptive event (e.g., traffic accident or fire). Moreover, our approach can also be adopted by Tunnel Management Agency (TMA) to evaluate probable variations in the frequency of severe and non-severe crashes that might be due to changes in traffic flow by day of the week. Our model can also help to define a control system on traffic flows entering the tunnel, able also to suggest an alternative itinerary if necessary.