Statistical and machine-learning methods for clearance time prediction of road incidents: A methodology review

doi:10.1016/j.amar.2020.100123

Analytic Methods in Accident Research

Volume 27, September 2020, 100123

https://doi.org/10.1016/j.amar.2020.100123 Get rights and content

Highlights

•
This study examines the performance of eight methods for predicting incident clearance time.
•
The results show “heterogeneity” models are superior to statistical models.
•
The significant factors of road incident clearance time for each model are illustrated.
•
This study provides the analysts with insight into the selection of suitable modeling approach.

Abstract

Accurate clearance time prediction for road incident would be helpful to evaluate the incident impacting range and provide route guiding strategy according to the predicted results, and thus reduce the travel delays caused by incidents. Currently, a number of approaches have been developed for predicting incident clearance time and investigating the effects of influential factors. Statistical and machine learning methods are the two major methodological approaches. This study aims to make a methodology review for these methods by comprehensively examining their performance in incident clearance time prediction, especially, when omitted variables present significant impacts on selected variables. Specifically, we consider four widely used statistical models: Accelerated Failure Time (AFT) model, Quantile Regression (QR) model, Finite Mixture (FM) model, and Random Parameters Hazard-Based Duration (RPHD) model, and four machine learning models: K-Nearest Neighbor (KNN) model, Support Vector Machine (SVM) model, Back Propagation Neural Network (BPNN) model, and Random Forest (RF) model as candidates. Moreover, the abilities of these methods in uncovering the underlying causality (explaining the causal effects of significant influential factors on clearance time) are also investigated. Incident clearance time data was collected on freeway road sections in Seattle, Washington State from 2009 to 2011. The conclusions can be summarized as follows: 1) the RF model and RPHD model outperform the other three models in data fitting and model prediction in their respective methodological categories; 2) three “heterogeneity” methods including RPHD, FM and QR outperform machine learning methods in model prediction as measured by MAPE; 3) machine learning methods perform stably in model prediction relative to the statistical methods; 4) incident type and lane closure type present significant effects on incident clearance time in all eight selected models.

Introduction

Traffic incident management is of great importance to transportation agencies. The delays of clearance for an incident directly increase the occurrence likelihood of a secondary incident and induce more severe traffic congestion (Mannering and Bhat, 2014, Chung et al., 2015). Reducing incident clearance time is regarded as the most important work for alleviating the impact of traffic incidents. To achieve this goal, understanding the influence factors and their impacts on incident clearance time, and further accurately predicting the future clearance time of an incident are two basic essentials in traffic incident management.

In the past several decades, a lot of methods have been proposed in modeling or predicting clearance time of incident duration. Methodologically, these methods can be generally divided into statistical methods and machine learning methods. Since establishing on the basis of rigorous mathematical hypothesis and functional formations, statistical methods have the capability to explain the mathematical relation between estimator (incident duration) and explanatory variable (contributing factors). Linear regression is one of the earliest linear-based regression models for incident prediction. However, it simply assumes a linear relationship between the length of incident clearance time and the influential factors (Giuliano, 1989, Khattak et al., 1995, Garib et al., 1997, Cohen and Nouveliere, 1997, Valenti et al., 2010, Khattak et al., 2012). Unlike linear-based regression techniques, the hazard-based duration models consider not only the length of incident duration but also the relationship between the duration and the probability that the duration of an incident will end in the next short time interval (Hensher and Mannering, 1994, Hojati et al., 2013, Zou et al., 2016). Starting with an early work conducted by Jones et al. (1991), hazard-based duration models have been widely applied in modeling incident duration, such as Proportional Hazards (PH) model, Accelerated Failure Time (AFT) model and other hazard-based models. In regard to machine learning, many promising approaches, such as K-Nearest Neighbor method (Kim and Choi, 2001, Smith and Smith, 2001, Valenti et al., 2010, Wen et al., 2012), Support Vector Regression method (Wu et al., 2011), Bayesian Networks method (Ozbay and Noyan, 2006, Boyles et al., 2007) and Decision Trees method (Ma et al., 2017) have been widely used to predict the incident clearance time.

The reliability and efficiency of the aforementioned models highly rely on the quality of the incident database which is used for the model specification. However, existing incident databases are typically extracted from the authorities (e.g. transportation department reports and local governments). These conventional databases usually collected over a long time period as well as different locations and facilities to ensure the adequacy of sample size for analysis. Moreover, these databases often only cover a small fraction of a large number of elements that define incident related features, operational strategies, traffic characteristics, temporal and environmental conditions. Many other important elements, such as the factors reflecting the traffic statues and the characteristics of operational workers and specialized equipment, are still uncollectable or even unobservable in the analysis. If these omitted variables are significantly correlated with selected variables, this omission could generate variations in the effects of selected variables on the incident clearance time. Consequently, the model will be unreliable and the estimated parameters will be biased, resulting in erroneous inferences and predictions. The omitted factors constitute the so-called unobserved heterogeneity, which has been widely investigated in the context of traffic engineering (Mannering et al., 2016, Li et al., 2016, Han et al., 2018, Mannering, 2018, Li, 2018, Zhou and Lin, 2019, Huang et al., 2019, Yan et al., 2019, Yan et al., 2020). Many statistical methods have been developed to address the effect of unobserved heterogeneity, including the finite mixture (FM) model (Frühwirth-Schnatter, 2006, Zou et al., 2014, Zou et al., 2016), random parameters (RP) model (Anastasopoulos and Mannering, 2016, Behnood and Mannering, 2017a, Behnood and Mannering, 2017b, Heydari et al., 2018) and quantile regression (QR) model (Fitzenberger and Wilke, 2006, Zou et al., 2017).

The objective of the present study is to provide a comprehensive review of the widely used statistical and machine learning methods in incident clearance time analysis. We mainly concern the aspects of their abilities of model prediction and inference, especially in handling the potential of unobserved heterogeneity. Four statistical models (given the superiority in survival data analysis, only the hazard-based duration models and its extended models are explored) and four machine learning models are examined. Several studies have summarized the approaches to traffic incident duration prediction. Valenti et al. (2010) investigated the reliability of five incident duration prediction models for real-time application. However, this study did not raise enough concern about the performance of hazard-based models. Araghi et al. (2014) deeply compared the prediction performance between KNN and AFT but lacked comprehensiveness in illustrating the performance of these two methods. Li et al. (2018) presented a systematic review of traffic incident duration studies including data collection, factors investigation and model construction, but it was limited on a theoretical review.

The rest of the paper is structured as follows. In the next section, a review of previous studies on traffic incident duration analysis using various models is presented. The four statistical models and four machine learning models are briefly introduced in Section 3. Then, the data is described in Section 4. Section 5 introduces the results of selected eight models and the discussions of their performance for handling the data issue of unobserved heterogeneity. And the last section is the conclusion.

Section snippets

Literature review

According to the past efforts that have been made, there are mainly two objectives for modeling and analyzing traffic incident clearance time. The first objective is duration prediction, and it is usually the focus of machine learning methods. Due to the flexible structure of these methods, complex and highly nonlinear relationships between dependent and independent variables can be handled. Generally, in terms of structure, machine learning methods are categorized as distance metric learning

Methodology

Four statistical methods, namely accelerated failure time (AFT) method, finite mixture (FM) method, random parameters hazard-based duration (RPHD) method and quantile regression (QR) method, as well as four machine learning methods, including K-nearest neighbor (KNN) method, support vector machine (SVM) method, back propagation neural network (BPNN) method, and random forest (RF) method, are investigated and discussed in this study. This section presents a brief introduction to mentioned-above

Data source collection and variables selection

In this study, the traffic incident dataset was collected from the Washington Incident Tracking System (WITS) database, managed by the Washington Department of Transportation. The data collection ranges start from 1 to 5 Corridor between Boeing Access Road (Milepost 157) to the Seattle Central Business Milepost District (Milepost 165). The site is selected because of heavy traffic congestion and high frequency of incidents occurrence. Besides, the annual average daily traffic (AADT) data

Model results

The aim of this section is to comprehensively examine the performance of the selected eight models. Firstly, the performance of model prediction will be generally compared between two methodological approaches, i.e. statistical method and machine learning method. Then we will move on to compare the model prediction of the models within each of these two methodological categories. At the end of this section, the effects of all significant explanatory variables on clearance time will be analyzed.

Conclusions

The study comprehensively reviews eight methods that have been widely used in traffic incident clearance time analysis. Four statistical methods (AFT, QR, FM, and RPHD) and four machine learning methods (KNN, SVM, BPNN, and RF) are selected. In particular, the performance of these methods in clearance time prediction and the ability in influential factors explanation are investigated based on the incident dataset collected from Washington Incident Tracking System. At first, the eight methods

Acknowledgments

The research is funded by the National Natural Science Foundation of China (No. 71701215), Innovation-Driven Project of Central South University (No. 2020CX041), Foundation of Central South University (No. 502045002), Postdoctoral Science Foundation of China (No. 2018M630914 and 2019T120716).

References (71)

P. Anastasopoulos et al.
The effect of speed limits on drivers' choice of speed: a random parameters seemingly unrelated equations approach
Analytic Methods in Accident Research
(2016)
A. Behnood et al.
The effect of passengers on driver-injury severities in single-vehicle crashes: a random parameters heterogeneity-inmeans approach
Analytic Methods in Accident Research
(2017)
A. Behnood et al.
Determinants of bicyclist injury severities in bicycle-vehicle crashes: a random parameters approach with heterogeneity in means and variances
Analytic Methods in Accident Research
(2017)
C. Bhat
Simulation estimation of mixed discrete choice models using randomized and scrambled Halton sequences
Transportation Research Part B: Methodological
(2003)
Y. Chung
Development of an accident duration prediction model on the Korean freeway systems
Accident Analysis and Prevention
(2010)
Y. Chung et al.
Simultaneous equation modeling of freeway accident duration and lanes blocked
Analytic Methods in Accident Research
(2015)
S. Cohen et al.
Modeling incident duration on an urban expressway
IFAC Proceedings Volumes
(1997)
C. Ding et al.
Exploring the influential factors in incident clearance time: disentangling causation from self-selection bias
Accident Analysis and Prevention
(2015)
G. Giuliano
Incident characteristics, frequency, and duration on a high volume urban freeway
Transportation Research Part A: General
(1989)
C. Han et al.
Investigating varying effect of road-level factors on crash frequency across regions: a Bayesian hierarchical random parameter modeling approach
Analytic Methods in Accident Research
(2018)

S. Heydari et al.

Benchmarking regions using a heteroskedastic grouped random parameters model with heterogeneity in mean and variance: applications to grade crossing safety analysis

Analytic Methods in Accident Research

(2018)

H. Huang et al.

Modeling unobserved heterogeneity for zonal crash frequencies: a Bayesian multivariate random-parameters model with mixture components for spatially correlated data

Analytic Methods in Accident Research

(2019)

A. Iranitalab et al.

Comparison of four statistical and machine learning methods for crash severity prediction

Accident Analysis and Prevention

(2017)

B. Jones et al.

Analysis of the frequency and duration of freeway accidents in Seattle

Accident Analysis and Prevention

(1991)

M. Karlaftis et al.

Statistical methods versus neural networks in transportation research: differences, similarities and some insights

Transportation Research Part C: Emerging Technologies

(2011)

H. Kim et al.

A comparative analysis of incident service time on urban freeways

IATSS Research

(2001)

D. Li et al.

Incorporating observed and unobserved heterogeneity in route choice analysis with sampled choice sets

Transportation Research Part C

(2016)

Z. Li

Unobserved and observed heterogeneity in risk attitudes: implications for valuing travel time savings and travel time variability

Transportation Research Part E

(2018)

F. Mannering

Temporal instability and the analysis of highway accident data

Analytic Methods in Accident Research

(2018)

F. Mannering et al.

Analytic methods in accident research: methodological frontier and future directions

Analytic Methods in Accident Research

(2014)

F. Mannering et al.

Big data, traditional data and the tradeoffs between prediction and causality in highway-safety analysis

Analytic Methods in Accident Research

(2020)

F. Mannering et al.

Unobserved heterogeneity and the statistical analysis of highway accident data

Analytic Methods in Accident Research

(2016)

J. Milton et al.

Highway accident severities and the mixed logit model: an exploratory empirical analysis

Accident Analysis and Prevention

(2008)

D. Nam et al.

An exploratory hazard-based analysis of highway incident duration

Transportation Research Part A

(2000)

K. Ozbay et al.

Estimation of incident clearance times using Bayesian Networks approach

Accident Analysis and Prevention

(2006)

J. Tang et al.

Crash injury severity analysis using a two-layer Stacking framework

Accident Analysis and Prevention

(2019)

C. Wei et al.

Sequential forecast of incident duration using artificial neural network models

Accident Analysis and Prevention

(2007)

P. Xu et al.

Modeling crash spatial heterogeneity: random parameter versus geographically weighting

Accident Analysis and Prevention

(2015)

P. Xu et al.

Revisiting crash spatial heterogeneity: a Bayesian spatially varying coefficients approach

Accident Analysis and Prevention

(2017)

Y. Yan et al.

Driving risk assessment using driving behavior data under continuous tunnel environment

Traffic Injury Prevention

(2019)

S. Zhou et al.

Spatial-temporal heterogeneity of air pollution: The relationship between built environment and on-road PM2.5 at micro scale

Transportation Research Part D

(2019)

Y. Zou et al.

Application of finite mixture models for analyzing freeway incident clearance time

Transport metric A: Transport Science

(2016)

Y. Zou et al.

Jointly analyzing freeway traffic incident clearance and response time using a copula-based approach

Transportation Research Part C

(2018)

Y. Zou et al.

Analyzing different functional forms of the varying weight parameter for finite mixture of negative binomial regression models

Analytic Methods in Accident Research

(2014)

P. Anastasopoulos et al.

Empirical assessment of the likelihood and duration of highway project time delays

Journal of Construction Engineering and Management

(2012)

Cited by (101)

Prediction of rail transit delays with machine learning: How to exploit open data sources
2024, Multimodal Transportation
The use of public transport data has evolved rapidly over the past decades. Indeed, the availability of diverse data sources and advances in analytics have led to a greater emphasis on utilizing data to enhance public transport services. Rail transit systems have increasingly become the preferred mode of travel due to their comfort, speed, and (mostly) emission-free nature. However, persistent delays continue to be a concern. Machine learning-based prediction of transit delays is an emerging field gaining recognition. The first contribution of this paper is to illustrate how to exploit available open data to improve the prediction of rail transit delays using machine learning. Moreover, through a comparison of various well-known machine learning approaches, we show that they can yield significantly different results. Notably, the improved support vector machine method presented in this study exhibits exceptional performance and is well-suited for long-term predictions. Furthermore, we have incorporated explainable artificial intelligence techniques to identify and assess the most significant factors influencing delays. To perform experiments with the method and draw robust conclusions, three case studies featuring different rail services in major cities are provided.
Integrating visual large language model and reasoning chain for driver behavior analysis and risk assessment
2024, Accident Analysis and Prevention
Driver behavior is a critical factor in driving safety, making the development of sophisticated distraction classification methods essential. Our study presents a Distracted Driving Classification (DDC) approach utilizing a visual Large Language Model (LLM), named the Distracted Driving Language Model (DDLM). The DDLM introduces whole-body human pose estimation to isolate and analyze key postural features—head, right hand, and left hand—for precise behavior classification and better interpretability. Recognizing the inherent limitations of LLMs, particularly their lack of logical reasoning abilities, we have integrated a reasoning chain framework within the DDLM, allowing it to generate clear, reasoned explanations for its assessments. Tailored specifically with relevant data, the DDLM demonstrates enhanced performance, providing detailed, context-aware evaluations of driver behaviors and corresponding risk levels. Notably outperforming standard models in both zero-shot and few-shot learning scenarios, as evidenced by tests on the 100-Driver dataset, the DDLM stands out as an advanced tool that promises significant contributions to driving safety by accurately detecting and analyzing driving distractions.
A comparative analysis of machine learning and statistical methods for evaluating building performance: A systematic review and future benchmarking framework
2024, Building and Environment
The utilization of machine learning (ML) techniques is increasingly prevalent in the domain of building performance evaluation. This trend is primarily driven by ML's capacity to capture intricate relationships between building attributes and performance metrics, such as energy consumption and comfort levels. However, the comparative merits of ML techniques and traditional statistical methods, such as linear and logistic regression, which are typically more cost-effective and interpretable, remains uncertain. This study presents a systematic comparison between ML and statistical methods in the assessment of building performance, considering factors such as model complexity, interpretability, required expertise, performance disparities, and computational costs. Findings indicate that, in most scenarios, ML techniques outperform statistical methods. Nevertheless, there are notable instances where statistical methods can compete, highlighting the context-dependent nature of technique selection. Furthermore, this research introduces a novel Python-based framework with a user-friendly spreadsheet interface designed for the evaluation and benchmarking of ML and statistical methods in research settings. The developed framework can be easily customized for ML evaluation and benchmarking in diverse fields, including production, logistics, supply chain management, and others.
PiracyAnalyzer: Spatial temporal patterns analysis of global piracy incidents
2024, Reliability Engineering and System Safety
Maritime piracy incidents present significant threats to maritime security, resulting in material damages and jeopardizing the safety of crews. Despite the scope of the issue, existing research has not adequately explored the diverse risks and theoretical implications involved. To fill that gap, this paper aims to develop a comprehensive framework for analyzing global piracy incidents. The framework assesses risk levels and identifies patterns from spatial, temporal, and spatio-temporal dimensions, which facilitates the development of informed anti-piracy policy decisions. Firstly, the paper introduces a novel risk assessment mechanism for piracy incidents and constructs a dataset encompassing 3,716 recorded incidents from 2010 to 2021. Secondly, this study has developed a visualization and analysis framework capable of examining piracy incidents through the identification of clusters, outliers, and hot spots. Thirdly, a number of experiments are conducted on the constructed dataset to scrutinize current spatial-temporal patterns of piracy accidents. In experiments, we analyze the current trends in piracy incidents on temporal, spatial, and spatio-temporal dimensions to provide a detailed examination of piracy incidents. The paper contributes new understandings of piracy distribution and patterns, thereby enhancing the effectiveness of anti-piracy measures.
Modeling spatiotemporal heterogeneity in interval-censored traffic incident time to normal flow by leveraging crowdsourced data: A geographically and temporally weighted proportional hazard analysis
2024, Accident Analysis and Prevention
Non-recurrent traffic congestion arising from traffic incidents is unpredictable but should be addressed efficiently to mitigate its adverse impacts on safety and travel time reliability. Numerous studies have been conducted about incident clearance time, while the recovery time, due to the limitations of data collection, is often inadvertently neglected in assessing incident-induced duration (i.e., the time from incident occurrence to the normal flow of traffic). Overlooking the recovery time is likely to underestimate the total incident-induced impact. Furthermore, the spatiotemporal heterogeneity of observed factors is not adequately captured in incident duration models. To address these gaps, this study specifically investigated traffic crashes as they reflect safety issues and are the primary cause of non-recurrent congestion. The emerging crowdsourced traffic reports were harnessed to estimate crash recovery time, which can complement the blind zone of fixed detectors. A geographically and temporally weighted proportional hazard (GWTPH) model was developed to untangle factors associated with the interval-censored crash duration. The results show that the GWTPH model outperforms the global model in goodness-of-fit. Many factors present a spatiotemporally heterogeneous effect. For example, the global model merely revealed that deploying dynamic message signs (DMS) shortened the crash time to normal. Notably, the GWTPH model highlights an average reduction of 32.8% with a standard deviation of 31% in time to normal. The study's findings and application of new spatiotemporal techniques are valuable for practitioners to localize strategies for incident management. For instance, deploying DMS can be very helpful in corridors when incidents happen, especially during peak hours.
Investigating gap acceptance behavior based on correlated random parameter survival model with heterogeneity in means
2024, Transportation Letters
This paper aims to investigate gap acceptance behavior during discretionary lane-changing maneuvers considering the heterogeneity among drivers and lane-changing urgency on a freeway segment from a microscopic perspective. It was found that there were statistically significant differences in gap acceptance behavior between changing lane to the left/right. The accelerated failure time(AFT) model, random parameter AFT model, and correlated random parameters with heterogeneity in means AFT model were established. The results show that the third model have the best goodness of fit. There is obvious mean heterogeneity in the gap acceptance models of different lane change directions, and the correlation between random parameters significantly affects the gap acceptance behavior. The urgency of lane change has a significant heterogeneous effect on the lagging gap acceptance behavior of lane-changing drivers and is significantly correlated with other random parameters. The study results can help improve the safety lane-changing modules for connected and autonomous vehicles.

View all citing articles on Scopus

View full text

Statistical and machine-learning methods for clearance time prediction of road incidents: A methodology review

Highlights

Abstract

Introduction

Section snippets

Literature review

Methodology

Data source collection and variables selection

Model results

Conclusions

Acknowledgments

Analytic Methods in Accident Research

Analytic Methods in Accident Research

Analytic Methods in Accident Research

Transportation Research Part B: Methodological

Accident Analysis and Prevention

Analytic Methods in Accident Research

IFAC Proceedings Volumes

Accident Analysis and Prevention

Transportation Research Part A: General

Analytic Methods in Accident Research

Analytic Methods in Accident Research

Analytic Methods in Accident Research

Accident Analysis and Prevention

Accident Analysis and Prevention

Transportation Research Part C: Emerging Technologies

IATSS Research

Transportation Research Part C

Transportation Research Part E

Analytic Methods in Accident Research

Analytic Methods in Accident Research

Analytic Methods in Accident Research

Analytic Methods in Accident Research

Accident Analysis and Prevention

Transportation Research Part A

Accident Analysis and Prevention

Accident Analysis and Prevention

Accident Analysis and Prevention

Accident Analysis and Prevention

Accident Analysis and Prevention

Traffic Injury Prevention

Transportation Research Part D

Transport metric A: Transport Science

Transportation Research Part C

Analytic Methods in Accident Research

Empirical assessment of the likelihood and duration of highway project time delays

Journal of Construction Engineering and Management