Multivariate random parameters zero-inflated negative binomial regression for analyzing urban midblock crashes

https://doi.org/10.1016/j.amar.2018.03.001Get rights and content

Highlights

  • Use the multivariate random parameters zero-inflated negative binomial (MVRPZINB) model to analyze three kinds of urban midblock crashes.

  • Show the superiority of the MVRPZINB model over other models in terms of goodness of fit and prediction accuracy.

  • Reveal the heterogeneous impacts of traffic operation and roadway geometric factors on crash frequency across crash types and segments.

Abstract

Urban midblock crashes are influenced mainly by traffic operation and roadway geometric features. In this paper, 10-year crash data from 1,506 directional urban midblock segments in Nebraska were analyzed using the multivariate random parameters zero-inflated negative binomial model to account for unobserved heterogeneity produced by correlations across segments, correlations across crash collision types, excessive zero crashes, and over dispersion. The multivariate random parameters zero-inflated negative binomial model was superior to many common crash frequency models in terms of both goodness of fit and prediction accuracy. Compared with the multivariate fixed parameters zero-inflated negative binomial model, the multivariate random parameters zero-inflated negative binomial model identified fewer key influencing factors and revealed segment-specific effects of these factors on different crash types. It showed that the number of lanes, annual average daily traffic per lane, and segment length might have non-positive effects on crash frequencies. Segments with a speed limit of 45 mph had fewer crashes than did those with lower speed limits, and there were fewer crashes on the segments in Omaha than on those in Lincoln. It was also found that neither the presence of a shoulder, on-street parking, or one-way traffic, nor lane width had significant influences on crash frequencies. These findings are informative for transportation agencies to take correct and efficient measures to accommodate diverse transportation demands without reducing traffic safety. By contrast, the fixed parameters model produced results consistent with intuition, but the results were insufficient to provide actionable recommendations.

Introduction

Traffic crashes can be divided into junction crashes and non-junction crashes based on where they occur (National Center for Statistics and Analysis, 2017). Non-junction crashes, also referred to as midblock crashes, are crashes that occur on roadway segments. In 2015, they accounted for 41.7% of the total number of crashes and 63.3% of fatal crashes in the United States (National Center for Statistics and Analysis, 2017). Thus, reducing midblock crashes is critical for improving traffic safety. Although midblock crashes are usually not directly influenced by junctions, they are greatly influenced by traffic operation and roadway geometric factors, which are much more complex on urban roadways than on rural roadways. On one hand, urban roadway segments usually have large traffic volumes and face diverse traffic demands, which might increase crash opportunities; for example, an increase in the number of crosswalks might increase the frequency of pedestrian crashes. On the other hand, urban development might limit or even reduce available roadway space, which might also increase crash risk; for example, vehicle lanes may be narrowed to make room for biking lanes and on-street parking. This predicament requires transportation agencies to determine what traffic operation and roadway geometric factors really influence the frequency of urban midblock crashes so that they can take effective measures to accommodate traffic demands without reducing traffic safety.

Previous studies have shown that important traffic operation and roadway geometric factors influencing midblock crashes include traffic volume (Bonneson and Mccoy, 1997, Greibe, 2003, Dumbaugh, 2006, Zhang et al., 2012, Manuel et al., 2014, Ferreira and Couto, 2015), speed limit (Greibe, 2003, Dumbaugh, 2006, Pande et al., 2010), on-street parking (Bonneson and Mccoy, 1997, Greibe, 2003), lane width (Greibe, 2003, Manuel et al., 2014), median type (Bonneson and Mccoy, 1997, Sawalha and Sayed, 2001), median width (Dumbaugh, 2006), number of lanes (Sawalha and Sayed, 2001, Greibe, 2003, Dumbaugh, 2006), land use (Bonneson and Mccoy, 1997, Sawalha and Sayed, 2001, Greibe, 2003), pavement condition (Usman et al., 2010, Xiong et al., 2014, Zeng and Huang, 2014), access points (Lee et al., 2011, Zeng and Huang, 2014), and so on. However, studies’ findings have often been inconsistent, that is, some factors might have had different effects in different studies. For example, speed limit was found to be not significant for midblock crash frequencies on a 27-mile urban arterial in Florida Department of Transportation District 5 (Dumbaugh, 2006), whereas it was the most important variable for midblock crash frequencies on a 19.659-mile corridor of U.S. Route 19 in Pasco County, Florida (Pande et al., 2010). This inconsistency implies that, in practice, the effects of some factors on crashes might be location specific. Ignoring this unobserved heterogeneity might produce biased and inefficient estimated parameters, leading to erroneous inferences and predictions (Mannering et al., 2016).

One solution to account for unobserved heterogeneity across observations in crash frequency analysis is to adopt random parameters count data models (Lord and Mannering, 2010, Chen and Tarko, 2014, Venkataraman et al., 2014, Barua et al., 2015, Barua et al., 2016, Coruh et al., 2015, Alarifi et al., 2017, Bhat et al., 2017, Chen et al., 2017, Rista et al., 2017). Compared to fixed parameters models assuming the same effects of factors on all observations, random parameters models can capture the observation-specific effects of factors on crash frequency and have also been widely applied in crash injury severity analyses (Russo et al., 2014, Zhao and Khattak, 2015, Zhao and Khattak, 2017, Behnood and Mannering, 2016, Behnood and Mannering, 2017a, Behnood and Mannering, 2017b, Naik et al., 2016, Anderson and Hernandez, 2017, Fountas and Anastasopoulos, 2017, Seraneeprakarn et al., 2017) and crash rate analyses (Anastasopoulos, 2016). Especially, for the data where one entity has multiple observations, such as panel data, group-specific random parameters models may be adopted to account for heterogeneity among groups (Wu et al., 2013, Sarwar et al., 2017). More details about random parameters formulations can be seen in the study by Mannering et al. (2016).

Crash data usually can be divided into multiple types based on different criteria. For example, midblock crashes can be divided based on the type of collision: rear-end crashes, right-angle crashes, side-swipe (same direction) crashes, single-vehicle crashes, overturn crashes, and so on. A single factor might be expected to have different effects on different collision types, causing different outcomes. Thus, identifying the specific significant factors for each collision type is important for transportation agencies so they can take accurate countermeasures to reduce specific types of collision. When these crashes are jointly analyzed, multivariate count data models are necessary, as univariate models may produce biased and inefficient results because the unobserved heterogeneity often present across crash types is ignored (Huang et al., 2008, Dong et al., 2014a, Mannering et al., 2016). Most multivariate count data models in literature were derived from the multivariate Poisson log-normal (MVPLN) model (Ma et al., 2008, El-Basyouny and Sayed, 2009, Aguero-Valverde and Jovanis, 2010, Barua et al., 2014, Zhan et al., 2015, Serhiyenko et al., 2016, Huang et al., 2017, Osama and Sayed, 2017, Zhao et al., 2017, Wang et al., 2018), which is flexible enough to accommodate various correlations among crash types, but it does not work well for crash data with excess zeros (Dong et al., 2014a). In addition to the multivariate Poisson log-normal model, the natural extensions of the Poisson and negative binomial (NB) models to multivariate data, i.e., the multivariate Poisson (MVP) model (Johnson et al., 1997, Ma and Kockelman, 2006) and the multivariate negative binomial (MVNB) model (Anastasopoulos et al., 2012, Chen et al., 2017), also have been used in some studies. The multivariate Poisson/negative binomial models assume positive correlations across crash types, but they cannot deal with crash data with excess zeros either, as the marginal distribution per crash type is still a Poisson/negative binomial model.

The zero-inflated models are often adopted for univariate count data with excess zeros (Lambert, 1992, Lord et al., 2005). The excess zeros in crash frequency data can be explained in two ways for zero-inflated models. One explanation is that there is a two-state crash-generating process: (i) a normal count state and (ii) an accident-free state, which can be thought of as a nearly safe state, with accidents occurring extremely rarely (Malyshkina and Mannering, 2010). The other explanation is that there is a two-state crash-reporting process: (i) one in which accidents did occur, but they were not reported for some reason, such as for minor crashes, which were not necessary to report, or hit-and-run crashes, i.e., a crash-underreporting state, and (ii) one in which all accidents that occurred were reported, i.e., a normal crash reporting state. This explanation applies to many scenarios, as crash underreporting has been found to be common in practice (Hauer and Hakkert, 1988, Elvik and Mysen, 1999, Yamamoto et al., 2008, Lord and Mannering, 2010, Yannis et al., 2014). Both explanations may justify the application of zero-inflated models in our case, although it is difficult to determine what the truth is by observing the data. In cases for which crash observations at each level of classification are characterized with a significant number of zero occurrences, the zero-inflated versions of the multivariate Poisson and negative binomial models, i.e., the multivariate zero-inflated Poisson (MVZIP) model (Li et al., 1999) and the multivariate zero-inflated negative binomial (MVZINB) model, are recommended. In traffic safety studies, the multivariate zero-inflated Poisson model was first used to examine the crash frequency at signalized intersections in Tennessee, and it was found to perform better than the univariate zero-inflated Poisson (UZIP) and multivariate Poisson log-normal models in terms of goodness of fit and prediction accuracy (Dong et al., 2014b). To account for over dispersion and unobserved heterogeneity across individual sites, Dong et al. (2014a) used the multivariate random parameters zero-inflated negative binomial (MVRPZINB) model in another crash frequency study, for which random parameters were assumed for the count part. Later, Anastasopoulos (2016) also adopted the multivariate random parameters zero-inflated negative binomial model in a crash frequency analysis, for which random parameters were assumed for both the count part and the zero-state part. Thus, the model is more flexible. In both studies, it was found that random parameters models were superior to fixed parameters models in terms of goodness of fit and prediction accuracy.

This paper presents the multivariate random parameters zero-inflated negative binomial model for analyzing urban midblock crashes by collision type. Here, midblock crashes refer to non-junction crashes that occurred on urban midblock segments bounded by signalized intersections. The objectives of this study were: (i) to identify important traffic operation and roadway geometric factors influencing urban midblock crash frequencies by collision type and (ii) to conduct a thorough review of the performance of the multivariate random parameters zero-inflated negative binomial model in accounting for unobserved heterogeneity produced by correlations across crash types, correlations across sites, excess zeros, and over dispersion. The results demonstrate the superiority of the multivariate random parameters zero-inflated negative binomial model to many common crash frequency analysis models.

Section snippets

The multivariate zero-inflated negative binomial model

For an m-dimensional observation, Y=(Y1,Y2,,Ym), the multivariate negative binomial model is defined as (Dong et al., 2014a):Y1=Z1+UY2=Z2+UYm=Zm+Uwhere m is dimension of Y, Z1, Z2,,Zm and U are independent negative binomial variables with respective means λ10, λ20,,λm0 and λ00.

An m-dimensional multivariate negative binomial model was constructed with (m+1) independent negative binomial variables. The elements of Y are positively correlated with each other due to the presence of U, which is

Data description

Yearly crash frequency data for 1,506 directional urban midblock segments in Lincoln and Omaha, Nebraska from 2003 to 2012 were collected from the Nebraska Department of Roads. Originally, these midblock segments were selected by a technical committee from the Nebraska Department of Roads to investigate the effects of narrow lane width on urban roadway safety (Sharma et al., 2015), for which researchers focused mainly on regular vehicle crashes, and thus excluded animal crashes, alcohol-related

Results and discussions

Out of the 10 years of data, data from 2003 to 2011 were used for the model estimation, and the 2012 data were used for prediction.

Conclusions

In this study, we analyzed sideswipe (same direction), rear-end, and other crash types over 10 years (2003–2012) on 1,506 directional urban midblock segments in Lincoln and Omaha, Nebraska. Traffic operation and roadway geometry characteristics were investigated to identify significant influencing factors. Due to the concern of unobserved heterogeneity produced by correlations across crash types and segments, excess zeros, and over dispersion in crash data, the multivariate random parameters

References (84)

  • C.R. Bhat et al.

    A new spatial and flexible multivariate random-coefficients model for the analysis of pedestrian injury counts by severity level

    Analytic Methods in Accident Research

    (2017)
  • E. Chen et al.

    Modeling safety of highway work zones with random parameters and random effects models

    Analytic Methods in Accident Research

    (2014)
  • S. Chen et al.

    Impact of road-surface condition on rural highway safety: a multivariate random parameters negative binomial approach

    Analytic Methods in Accident Research

    (2017)
  • E. Coruh et al.

    Accident analysis with the random parameters negative binomial panel count data model

    Analytic Methods in Accident Research

    (2015)
  • C. Dong et al.

    Multivariate random-parameters zero-inflated negative binomial regression model: an application to estimate crash frequencies at intersections

    Accident Analysis and Prevention

    (2014)
  • C. Dong et al.

    Examining signalized intersection crash frequency using multivariate zero-inflated Poisson regression

    Safety Science

    (2014)
  • K. El-Basyouny et al.

    Collision prediction models using multivariate Poisson-lognormal regression

    Accident Analysis and Prevention

    (2009)
  • S. Ferreira et al.

    A probabilistic approach towards a crash risk assessment of urban segments

    Transportation Research Part C

    (2015)
  • G. Fountas et al.

    A random thresholds random parameters hierarchical ordered probit analysis of highway accident injury-severities

    Analytic Methods in Accident Research

    (2017)
  • P. Greibe

    Accident prediction models for urban roads

    Accident Analysis and Prevention

    (2003)
  • H. Huang et al.

    Severity of driver injury and vehicle damage in traffic crashes at intersections: a Bayesian hierarchical analysis

    Accident Analysis and Prevention

    (2008)
  • H. Huang et al.

    A multivariate spatial model of crash frequency by transportation modes for urban intersections

    Analytic Methods in Accident Research

    (2017)
  • C. Lee et al.

    Development of crash modification factors for changing lane width on roadway segments using generalized nonlinear models

    Accident Analysis and Prevention

    (2015)
  • C. Liu et al.

    Exploring spatio-temporal effects in traffic crash trend analysis

    Analytic Methods in Accident Research

    (2017)
  • C. Liu et al.

    Using the multivariate spatio-temporal Bayesian model to analyze traffic crashes by severity

    Analytic Methods in Accident Research

    (2018)
  • D. Lord et al.

    The statistical analysis of crash-frequency data: a review and assessment of methodological alternatives

    Transportation Research Part A

    (2010)
  • D. Lord et al.

    Poisson, Poisson-Gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory

    Accident Analysis and Prevention

    (2005)
  • J. Ma et al.

    A multivariate Poisson-lognormal regression model for prediction of crash counts by severity, using Bayesian methods

    Accident Analysis and Prevention

    (2008)
  • X. Ma et al.

    Multivariate space-time modeling of crash frequencies by injury severity levels

    Analytic Methods in Accident Research

    (2017)
  • N.V. Malyshkina et al.

    Zero-state Markov switching count-data models: an empirical assessment

    Accident Analysis and Prevention

    (2010)
  • N.V. Malyshkina et al.

    Markov switching negative binomial models: an application to vehicle accident frequencies

    Accident Analysis and Prevention

    (2009)
  • F.L. Mannering

    Temporal instability and the analysis of highway accident data

    Analytic Methods in Accident Research

    (2018)
  • F.L. Mannering et al.

    Unobserved heterogeneity and the statistical analysis of highway accident data

    Analytic Methods in Accident Research

    (2016)
  • A. Manuel et al.

    Investigating the safety effects of road width on urban collector roadways

    Safety Science

    (2014)
  • S.P. Miaou et al.

    Bayesian ranking of sites for engineering safety improvements: decision parameter, treatability concept, statistical criterion, and spatial dependence

    Accident Analysis and Prevention

    (2005)
  • B. Naik et al.

    Weather impacts on single-vehicle truck crash injury severity

    Journal of Safety Research

    (2016)
  • A. Osama et al.

    Investigating the effect of spatial and mode correlations on active transportation safety modeling

    Analytic Methods in Accident Research

    (2017)
  • A. Pande et al.

    A classification tree based modeling approach for segment related crashes on multilane highways

    Journal of Safety Research

    (2010)
  • J. Park et al.

    Evaluation of safety effectiveness of multiple cross sectional features on urban arterials

    Accident Analysis and Prevention

    (2016)
  • B.J. Russo et al.

    Comparison of factors affecting injury severity in angle collisions by fault status using a random parameters bivariate ordered probit model

    Analytic Methods in Accident Research

    (2014)
  • M.T. Sarwar et al.

    Grouped random parameters bivariate probit analysis of perceived and observed aggressive driving behavior: a driving simulation study

    Analytic Methods in Accident Research

    (2017)
  • P. Seraneeprakarn et al.

    Occupant injury severities in hybrid-vehicle involved crashes: a random parameters approach with heterogeneity in means and variances

    Analytic Methods in Accident Research

    (2017)
  • Cited by (0)

    View full text