Multivariate random parameters zero-inflated negative binomial regression for analyzing urban midblock crashes
Introduction
Traffic crashes can be divided into junction crashes and non-junction crashes based on where they occur (National Center for Statistics and Analysis, 2017). Non-junction crashes, also referred to as midblock crashes, are crashes that occur on roadway segments. In 2015, they accounted for 41.7% of the total number of crashes and 63.3% of fatal crashes in the United States (National Center for Statistics and Analysis, 2017). Thus, reducing midblock crashes is critical for improving traffic safety. Although midblock crashes are usually not directly influenced by junctions, they are greatly influenced by traffic operation and roadway geometric factors, which are much more complex on urban roadways than on rural roadways. On one hand, urban roadway segments usually have large traffic volumes and face diverse traffic demands, which might increase crash opportunities; for example, an increase in the number of crosswalks might increase the frequency of pedestrian crashes. On the other hand, urban development might limit or even reduce available roadway space, which might also increase crash risk; for example, vehicle lanes may be narrowed to make room for biking lanes and on-street parking. This predicament requires transportation agencies to determine what traffic operation and roadway geometric factors really influence the frequency of urban midblock crashes so that they can take effective measures to accommodate traffic demands without reducing traffic safety.
Previous studies have shown that important traffic operation and roadway geometric factors influencing midblock crashes include traffic volume (Bonneson and Mccoy, 1997, Greibe, 2003, Dumbaugh, 2006, Zhang et al., 2012, Manuel et al., 2014, Ferreira and Couto, 2015), speed limit (Greibe, 2003, Dumbaugh, 2006, Pande et al., 2010), on-street parking (Bonneson and Mccoy, 1997, Greibe, 2003), lane width (Greibe, 2003, Manuel et al., 2014), median type (Bonneson and Mccoy, 1997, Sawalha and Sayed, 2001), median width (Dumbaugh, 2006), number of lanes (Sawalha and Sayed, 2001, Greibe, 2003, Dumbaugh, 2006), land use (Bonneson and Mccoy, 1997, Sawalha and Sayed, 2001, Greibe, 2003), pavement condition (Usman et al., 2010, Xiong et al., 2014, Zeng and Huang, 2014), access points (Lee et al., 2011, Zeng and Huang, 2014), and so on. However, studies’ findings have often been inconsistent, that is, some factors might have had different effects in different studies. For example, speed limit was found to be not significant for midblock crash frequencies on a 27-mile urban arterial in Florida Department of Transportation District 5 (Dumbaugh, 2006), whereas it was the most important variable for midblock crash frequencies on a 19.659-mile corridor of U.S. Route 19 in Pasco County, Florida (Pande et al., 2010). This inconsistency implies that, in practice, the effects of some factors on crashes might be location specific. Ignoring this unobserved heterogeneity might produce biased and inefficient estimated parameters, leading to erroneous inferences and predictions (Mannering et al., 2016).
One solution to account for unobserved heterogeneity across observations in crash frequency analysis is to adopt random parameters count data models (Lord and Mannering, 2010, Chen and Tarko, 2014, Venkataraman et al., 2014, Barua et al., 2015, Barua et al., 2016, Coruh et al., 2015, Alarifi et al., 2017, Bhat et al., 2017, Chen et al., 2017, Rista et al., 2017). Compared to fixed parameters models assuming the same effects of factors on all observations, random parameters models can capture the observation-specific effects of factors on crash frequency and have also been widely applied in crash injury severity analyses (Russo et al., 2014, Zhao and Khattak, 2015, Zhao and Khattak, 2017, Behnood and Mannering, 2016, Behnood and Mannering, 2017a, Behnood and Mannering, 2017b, Naik et al., 2016, Anderson and Hernandez, 2017, Fountas and Anastasopoulos, 2017, Seraneeprakarn et al., 2017) and crash rate analyses (Anastasopoulos, 2016). Especially, for the data where one entity has multiple observations, such as panel data, group-specific random parameters models may be adopted to account for heterogeneity among groups (Wu et al., 2013, Sarwar et al., 2017). More details about random parameters formulations can be seen in the study by Mannering et al. (2016).
Crash data usually can be divided into multiple types based on different criteria. For example, midblock crashes can be divided based on the type of collision: rear-end crashes, right-angle crashes, side-swipe (same direction) crashes, single-vehicle crashes, overturn crashes, and so on. A single factor might be expected to have different effects on different collision types, causing different outcomes. Thus, identifying the specific significant factors for each collision type is important for transportation agencies so they can take accurate countermeasures to reduce specific types of collision. When these crashes are jointly analyzed, multivariate count data models are necessary, as univariate models may produce biased and inefficient results because the unobserved heterogeneity often present across crash types is ignored (Huang et al., 2008, Dong et al., 2014a, Mannering et al., 2016). Most multivariate count data models in literature were derived from the multivariate Poisson log-normal (MVPLN) model (Ma et al., 2008, El-Basyouny and Sayed, 2009, Aguero-Valverde and Jovanis, 2010, Barua et al., 2014, Zhan et al., 2015, Serhiyenko et al., 2016, Huang et al., 2017, Osama and Sayed, 2017, Zhao et al., 2017, Wang et al., 2018), which is flexible enough to accommodate various correlations among crash types, but it does not work well for crash data with excess zeros (Dong et al., 2014a). In addition to the multivariate Poisson log-normal model, the natural extensions of the Poisson and negative binomial (NB) models to multivariate data, i.e., the multivariate Poisson (MVP) model (Johnson et al., 1997, Ma and Kockelman, 2006) and the multivariate negative binomial (MVNB) model (Anastasopoulos et al., 2012, Chen et al., 2017), also have been used in some studies. The multivariate Poisson/negative binomial models assume positive correlations across crash types, but they cannot deal with crash data with excess zeros either, as the marginal distribution per crash type is still a Poisson/negative binomial model.
The zero-inflated models are often adopted for univariate count data with excess zeros (Lambert, 1992, Lord et al., 2005). The excess zeros in crash frequency data can be explained in two ways for zero-inflated models. One explanation is that there is a two-state crash-generating process: (i) a normal count state and (ii) an accident-free state, which can be thought of as a nearly safe state, with accidents occurring extremely rarely (Malyshkina and Mannering, 2010). The other explanation is that there is a two-state crash-reporting process: (i) one in which accidents did occur, but they were not reported for some reason, such as for minor crashes, which were not necessary to report, or hit-and-run crashes, i.e., a crash-underreporting state, and (ii) one in which all accidents that occurred were reported, i.e., a normal crash reporting state. This explanation applies to many scenarios, as crash underreporting has been found to be common in practice (Hauer and Hakkert, 1988, Elvik and Mysen, 1999, Yamamoto et al., 2008, Lord and Mannering, 2010, Yannis et al., 2014). Both explanations may justify the application of zero-inflated models in our case, although it is difficult to determine what the truth is by observing the data. In cases for which crash observations at each level of classification are characterized with a significant number of zero occurrences, the zero-inflated versions of the multivariate Poisson and negative binomial models, i.e., the multivariate zero-inflated Poisson (MVZIP) model (Li et al., 1999) and the multivariate zero-inflated negative binomial (MVZINB) model, are recommended. In traffic safety studies, the multivariate zero-inflated Poisson model was first used to examine the crash frequency at signalized intersections in Tennessee, and it was found to perform better than the univariate zero-inflated Poisson (UZIP) and multivariate Poisson log-normal models in terms of goodness of fit and prediction accuracy (Dong et al., 2014b). To account for over dispersion and unobserved heterogeneity across individual sites, Dong et al. (2014a) used the multivariate random parameters zero-inflated negative binomial (MVRPZINB) model in another crash frequency study, for which random parameters were assumed for the count part. Later, Anastasopoulos (2016) also adopted the multivariate random parameters zero-inflated negative binomial model in a crash frequency analysis, for which random parameters were assumed for both the count part and the zero-state part. Thus, the model is more flexible. In both studies, it was found that random parameters models were superior to fixed parameters models in terms of goodness of fit and prediction accuracy.
This paper presents the multivariate random parameters zero-inflated negative binomial model for analyzing urban midblock crashes by collision type. Here, midblock crashes refer to non-junction crashes that occurred on urban midblock segments bounded by signalized intersections. The objectives of this study were: (i) to identify important traffic operation and roadway geometric factors influencing urban midblock crash frequencies by collision type and (ii) to conduct a thorough review of the performance of the multivariate random parameters zero-inflated negative binomial model in accounting for unobserved heterogeneity produced by correlations across crash types, correlations across sites, excess zeros, and over dispersion. The results demonstrate the superiority of the multivariate random parameters zero-inflated negative binomial model to many common crash frequency analysis models.
Section snippets
The multivariate zero-inflated negative binomial model
For an m-dimensional observation, , the multivariate negative binomial model is defined as (Dong et al., 2014a):where is dimension of , , and are independent negative binomial variables with respective means , and .
An m-dimensional multivariate negative binomial model was constructed with independent negative binomial variables. The elements of are positively correlated with each other due to the presence of , which is
Data description
Yearly crash frequency data for 1,506 directional urban midblock segments in Lincoln and Omaha, Nebraska from 2003 to 2012 were collected from the Nebraska Department of Roads. Originally, these midblock segments were selected by a technical committee from the Nebraska Department of Roads to investigate the effects of narrow lane width on urban roadway safety (Sharma et al., 2015), for which researchers focused mainly on regular vehicle crashes, and thus excluded animal crashes, alcohol-related
Results and discussions
Out of the 10 years of data, data from 2003 to 2011 were used for the model estimation, and the 2012 data were used for prediction.
Conclusions
In this study, we analyzed sideswipe (same direction), rear-end, and other crash types over 10 years (2003–2012) on 1,506 directional urban midblock segments in Lincoln and Omaha, Nebraska. Traffic operation and roadway geometry characteristics were investigated to identify significant influencing factors. Due to the concern of unobserved heterogeneity produced by correlations across crash types and segments, excess zeros, and over dispersion in crash data, the multivariate random parameters
References (84)
- et al.
Crash modeling for intersections and segments along corridors: a Bayesian multilevel joint model with random parameters
Analytic Methods in Accident Research
(2017) Random parameters multivariate tobit and zero-inflated count data models: addressing unobserved and zero-state heterogeneity in accident injury-severity rate and frequency analysis
Analytic Methods in Accident Research
(2016)- et al.
A multivariate tobit analysis of highway accident-injury-severity rates
Accident Analysis and Prevention
(2012) - et al.
Roadway classifications and the accident injury severities of heavy-vehicle drivers
Analytic Methods in Accident Research
(2017) - et al.
A full Bayesian multivariate count data model of collision severity with spatial correlation
Analytic Methods in Accident Research
(2014) - et al.
Effects of spatial correlation in random parameters collision count-data models
Analytic Methods in Accident Research
(2015) - et al.
Multivariate random parameters collision count data models with spatial heterogeneity
Analytic Methods in Accident Research
(2016) - et al.
The effect of passengers on driver-injury severities in single-vehicle crashes: a random parameters heterogeneity-in-means approach
Analytic Methods in Accident Research
(2017) - et al.
Determinants of bicyclist injury severities in bicycle-vehicle crashes: a random parameters approach with heterogeneity in means and variances
Analytic Methods in Accident Research
(2017) - et al.
An empirical assessment of the effects of economic recessions on pedestrian-injury crashes using mixed and latent-class models
Analytic Methods in Accident Research
(2016)