Statistical and machine-learning methods for clearance time prediction of road incidents: A methodology review

https://doi.org/10.1016/j.amar.2020.100123Get rights and content

Highlights

  • This study examines the performance of eight methods for predicting incident clearance time.

  • The results show “heterogeneity” models are superior to statistical models.

  • The significant factors of road incident clearance time for each model are illustrated.

  • This study provides the analysts with insight into the selection of suitable modeling approach.

Abstract

Accurate clearance time prediction for road incident would be helpful to evaluate the incident impacting range and provide route guiding strategy according to the predicted results, and thus reduce the travel delays caused by incidents. Currently, a number of approaches have been developed for predicting incident clearance time and investigating the effects of influential factors. Statistical and machine learning methods are the two major methodological approaches. This study aims to make a methodology review for these methods by comprehensively examining their performance in incident clearance time prediction, especially, when omitted variables present significant impacts on selected variables. Specifically, we consider four widely used statistical models: Accelerated Failure Time (AFT) model, Quantile Regression (QR) model, Finite Mixture (FM) model, and Random Parameters Hazard-Based Duration (RPHD) model, and four machine learning models: K-Nearest Neighbor (KNN) model, Support Vector Machine (SVM) model, Back Propagation Neural Network (BPNN) model, and Random Forest (RF) model as candidates. Moreover, the abilities of these methods in uncovering the underlying causality (explaining the causal effects of significant influential factors on clearance time) are also investigated. Incident clearance time data was collected on freeway road sections in Seattle, Washington State from 2009 to 2011. The conclusions can be summarized as follows: 1) the RF model and RPHD model outperform the other three models in data fitting and model prediction in their respective methodological categories; 2) three “heterogeneity” methods including RPHD, FM and QR outperform machine learning methods in model prediction as measured by MAPE; 3) machine learning methods perform stably in model prediction relative to the statistical methods; 4) incident type and lane closure type present significant effects on incident clearance time in all eight selected models.

Introduction

Traffic incident management is of great importance to transportation agencies. The delays of clearance for an incident directly increase the occurrence likelihood of a secondary incident and induce more severe traffic congestion (Mannering and Bhat, 2014, Chung et al., 2015). Reducing incident clearance time is regarded as the most important work for alleviating the impact of traffic incidents. To achieve this goal, understanding the influence factors and their impacts on incident clearance time, and further accurately predicting the future clearance time of an incident are two basic essentials in traffic incident management.

In the past several decades, a lot of methods have been proposed in modeling or predicting clearance time of incident duration. Methodologically, these methods can be generally divided into statistical methods and machine learning methods. Since establishing on the basis of rigorous mathematical hypothesis and functional formations, statistical methods have the capability to explain the mathematical relation between estimator (incident duration) and explanatory variable (contributing factors). Linear regression is one of the earliest linear-based regression models for incident prediction. However, it simply assumes a linear relationship between the length of incident clearance time and the influential factors (Giuliano, 1989, Khattak et al., 1995, Garib et al., 1997, Cohen and Nouveliere, 1997, Valenti et al., 2010, Khattak et al., 2012). Unlike linear-based regression techniques, the hazard-based duration models consider not only the length of incident duration but also the relationship between the duration and the probability that the duration of an incident will end in the next short time interval (Hensher and Mannering, 1994, Hojati et al., 2013, Zou et al., 2016). Starting with an early work conducted by Jones et al. (1991), hazard-based duration models have been widely applied in modeling incident duration, such as Proportional Hazards (PH) model, Accelerated Failure Time (AFT) model and other hazard-based models. In regard to machine learning, many promising approaches, such as K-Nearest Neighbor method (Kim and Choi, 2001, Smith and Smith, 2001, Valenti et al., 2010, Wen et al., 2012), Support Vector Regression method (Wu et al., 2011), Bayesian Networks method (Ozbay and Noyan, 2006, Boyles et al., 2007) and Decision Trees method (Ma et al., 2017) have been widely used to predict the incident clearance time.

The reliability and efficiency of the aforementioned models highly rely on the quality of the incident database which is used for the model specification. However, existing incident databases are typically extracted from the authorities (e.g. transportation department reports and local governments). These conventional databases usually collected over a long time period as well as different locations and facilities to ensure the adequacy of sample size for analysis. Moreover, these databases often only cover a small fraction of a large number of elements that define incident related features, operational strategies, traffic characteristics, temporal and environmental conditions. Many other important elements, such as the factors reflecting the traffic statues and the characteristics of operational workers and specialized equipment, are still uncollectable or even unobservable in the analysis. If these omitted variables are significantly correlated with selected variables, this omission could generate variations in the effects of selected variables on the incident clearance time. Consequently, the model will be unreliable and the estimated parameters will be biased, resulting in erroneous inferences and predictions. The omitted factors constitute the so-called unobserved heterogeneity, which has been widely investigated in the context of traffic engineering (Mannering et al., 2016, Li et al., 2016, Han et al., 2018, Mannering, 2018, Li, 2018, Zhou and Lin, 2019, Huang et al., 2019, Yan et al., 2019, Yan et al., 2020). Many statistical methods have been developed to address the effect of unobserved heterogeneity, including the finite mixture (FM) model (Frühwirth-Schnatter, 2006, Zou et al., 2014, Zou et al., 2016), random parameters (RP) model (Anastasopoulos and Mannering, 2016, Behnood and Mannering, 2017a, Behnood and Mannering, 2017b, Heydari et al., 2018) and quantile regression (QR) model (Fitzenberger and Wilke, 2006, Zou et al., 2017).

The objective of the present study is to provide a comprehensive review of the widely used statistical and machine learning methods in incident clearance time analysis. We mainly concern the aspects of their abilities of model prediction and inference, especially in handling the potential of unobserved heterogeneity. Four statistical models (given the superiority in survival data analysis, only the hazard-based duration models and its extended models are explored) and four machine learning models are examined. Several studies have summarized the approaches to traffic incident duration prediction. Valenti et al. (2010) investigated the reliability of five incident duration prediction models for real-time application. However, this study did not raise enough concern about the performance of hazard-based models. Araghi et al. (2014) deeply compared the prediction performance between KNN and AFT but lacked comprehensiveness in illustrating the performance of these two methods. Li et al. (2018) presented a systematic review of traffic incident duration studies including data collection, factors investigation and model construction, but it was limited on a theoretical review.

The rest of the paper is structured as follows. In the next section, a review of previous studies on traffic incident duration analysis using various models is presented. The four statistical models and four machine learning models are briefly introduced in Section 3. Then, the data is described in Section 4. Section 5 introduces the results of selected eight models and the discussions of their performance for handling the data issue of unobserved heterogeneity. And the last section is the conclusion.

Section snippets

Literature review

According to the past efforts that have been made, there are mainly two objectives for modeling and analyzing traffic incident clearance time. The first objective is duration prediction, and it is usually the focus of machine learning methods. Due to the flexible structure of these methods, complex and highly nonlinear relationships between dependent and independent variables can be handled. Generally, in terms of structure, machine learning methods are categorized as distance metric learning

Methodology

Four statistical methods, namely accelerated failure time (AFT) method, finite mixture (FM) method, random parameters hazard-based duration (RPHD) method and quantile regression (QR) method, as well as four machine learning methods, including K-nearest neighbor (KNN) method, support vector machine (SVM) method, back propagation neural network (BPNN) method, and random forest (RF) method, are investigated and discussed in this study. This section presents a brief introduction to mentioned-above

Data source collection and variables selection

In this study, the traffic incident dataset was collected from the Washington Incident Tracking System (WITS) database, managed by the Washington Department of Transportation. The data collection ranges start from 1 to 5 Corridor between Boeing Access Road (Milepost 157) to the Seattle Central Business Milepost District (Milepost 165). The site is selected because of heavy traffic congestion and high frequency of incidents occurrence. Besides, the annual average daily traffic (AADT) data

Model results

The aim of this section is to comprehensively examine the performance of the selected eight models. Firstly, the performance of model prediction will be generally compared between two methodological approaches, i.e. statistical method and machine learning method. Then we will move on to compare the model prediction of the models within each of these two methodological categories. At the end of this section, the effects of all significant explanatory variables on clearance time will be analyzed.

Conclusions

The study comprehensively reviews eight methods that have been widely used in traffic incident clearance time analysis. Four statistical methods (AFT, QR, FM, and RPHD) and four machine learning methods (KNN, SVM, BPNN, and RF) are selected. In particular, the performance of these methods in clearance time prediction and the ability in influential factors explanation are investigated based on the incident dataset collected from Washington Incident Tracking System. At first, the eight methods

Acknowledgments

The research is funded by the National Natural Science Foundation of China (No. 71701215), Innovation-Driven Project of Central South University (No. 2020CX041), Foundation of Central South University (No. 502045002), Postdoctoral Science Foundation of China (No. 2018M630914 and 2019T120716).

References (71)

  • S. Heydari et al.

    Benchmarking regions using a heteroskedastic grouped random parameters model with heterogeneity in mean and variance: applications to grade crossing safety analysis

    Analytic Methods in Accident Research

    (2018)
  • H. Huang et al.

    Modeling unobserved heterogeneity for zonal crash frequencies: a Bayesian multivariate random-parameters model with mixture components for spatially correlated data

    Analytic Methods in Accident Research

    (2019)
  • A. Iranitalab et al.

    Comparison of four statistical and machine learning methods for crash severity prediction

    Accident Analysis and Prevention

    (2017)
  • B. Jones et al.

    Analysis of the frequency and duration of freeway accidents in Seattle

    Accident Analysis and Prevention

    (1991)
  • M. Karlaftis et al.

    Statistical methods versus neural networks in transportation research: differences, similarities and some insights

    Transportation Research Part C: Emerging Technologies

    (2011)
  • H. Kim et al.

    A comparative analysis of incident service time on urban freeways

    IATSS Research

    (2001)
  • D. Li et al.

    Incorporating observed and unobserved heterogeneity in route choice analysis with sampled choice sets

    Transportation Research Part C

    (2016)
  • Z. Li

    Unobserved and observed heterogeneity in risk attitudes: implications for valuing travel time savings and travel time variability

    Transportation Research Part E

    (2018)
  • F. Mannering

    Temporal instability and the analysis of highway accident data

    Analytic Methods in Accident Research

    (2018)
  • F. Mannering et al.

    Analytic methods in accident research: methodological frontier and future directions

    Analytic Methods in Accident Research

    (2014)
  • F. Mannering et al.

    Big data, traditional data and the tradeoffs between prediction and causality in highway-safety analysis

    Analytic Methods in Accident Research

    (2020)
  • F. Mannering et al.

    Unobserved heterogeneity and the statistical analysis of highway accident data

    Analytic Methods in Accident Research

    (2016)
  • J. Milton et al.

    Highway accident severities and the mixed logit model: an exploratory empirical analysis

    Accident Analysis and Prevention

    (2008)
  • D. Nam et al.

    An exploratory hazard-based analysis of highway incident duration

    Transportation Research Part A

    (2000)
  • K. Ozbay et al.

    Estimation of incident clearance times using Bayesian Networks approach

    Accident Analysis and Prevention

    (2006)
  • J. Tang et al.

    Crash injury severity analysis using a two-layer Stacking framework

    Accident Analysis and Prevention

    (2019)
  • C. Wei et al.

    Sequential forecast of incident duration using artificial neural network models

    Accident Analysis and Prevention

    (2007)
  • P. Xu et al.

    Modeling crash spatial heterogeneity: random parameter versus geographically weighting

    Accident Analysis and Prevention

    (2015)
  • P. Xu et al.

    Revisiting crash spatial heterogeneity: a Bayesian spatially varying coefficients approach

    Accident Analysis and Prevention

    (2017)
  • Y. Yan et al.

    Driving risk assessment using driving behavior data under continuous tunnel environment

    Traffic Injury Prevention

    (2019)
  • S. Zhou et al.

    Spatial-temporal heterogeneity of air pollution: The relationship between built environment and on-road PM2.5 at micro scale

    Transportation Research Part D

    (2019)
  • Y. Zou et al.

    Application of finite mixture models for analyzing freeway incident clearance time

    Transport metric A: Transport Science

    (2016)
  • Y. Zou et al.

    Jointly analyzing freeway traffic incident clearance and response time using a copula-based approach

    Transportation Research Part C

    (2018)
  • Y. Zou et al.

    Analyzing different functional forms of the varying weight parameter for finite mixture of negative binomial regression models

    Analytic Methods in Accident Research

    (2014)
  • P. Anastasopoulos et al.

    Empirical assessment of the likelihood and duration of highway project time delays

    Journal of Construction Engineering and Management

    (2012)
  • Cited by (101)

    View all citing articles on Scopus
    View full text