当前位置: X-MOL 学术Journal of Transportation Safety & Security › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Modeling highly imbalanced crash severity data by ensemble methods and global sensitivity analysis
Journal of Transportation Safety & Security ( IF 2.825 ) Pub Date : 2020-07-22 , DOI: 10.1080/19439962.2020.1796863
Liming Jiang 1 , Yuanchang Xie 1 , Xiao Wen 1 , Tianzhu Ren 2
Affiliation  

Abstract

Crash severity has been extensively studied and numerous methods have been developed for investigating the relationship between crash outcome and explanatory variables. Crash severity data are often characterized by highly imbalanced severity distributions, with most crashes in the Property-Damage-Only (PDO) category and the severe crash category making up only a fraction of the total observations. Many methods perform better on outcome categories with the most observations than other categories. This often leads to a high modeling accuracy for PDO crashes but poor accuracies for other severity categories. This research introduces two ensemble methods to model imbalanced crash severity data: AdaBoost and Gradient Boosting. It also adopts a more reasonable performance metric, F1 score, for model selection. It is found that AdaBoost and Gradient Boosting outperform other benchmark methods and generate more balanced prediction accuracies. Additionally, a global sensitivity analysis is adopted to determine the individual and joint impacts of explanatory factors on crash severity outcome. Vertical curve, seat belt use, accident type, road characteristics, and truck percentage are found to be the most influential factors. Finally, a simulation-based approach is used to further study how the impact of a particular factor may vary with respect to different value ranges.



中文翻译:

通过集成方法和全局敏感性分析对高度不平衡的碰撞严重性数据进行建模

摘要

碰撞严重程度已经被广泛研究,并且已经开发了许多方法来研究碰撞结果和解释变量之间的关系。碰撞严重性数据的特征通常是严重性分布高度不平衡,大多数碰撞属于仅财产损坏 (PDO) 类别,而严重碰撞类别仅占总观察值的一小部分。与其他类别相比,许多方法在观察次数最多的结果类别上表现更好。这通常会导致 PDO 崩溃的建模精度很高,但其他严重性类别的精度较差。本研究引入了两种集成方法来建模不平衡的碰撞严重性数据:AdaBoost 和 Gradient Boosting。它还采用了更合理的性能指标F1分数来进行模型选择。发现 AdaBoost 和 Gradient Boosting 优于其他基准方法并产生更平衡的预测精度。此外,采用全局敏感性分析来确定解释性因素对碰撞严重程度结果的单独和联合影响。发现垂直曲线、安全带使用、事故类型、道路特征和卡车百分比是最有影响的因素。最后,使用基于模拟的方法来进一步研究特定因素的影响如何随不同的值范围而变化。道路特征和卡车百分比被认为是最有影响的因素。最后,使用基于模拟的方法来进一步研究特定因素的影响如何随不同的值范围而变化。道路特征和卡车百分比被认为是最有影响的因素。最后,使用基于模拟的方法来进一步研究特定因素的影响如何随不同的值范围而变化。

更新日期:2020-07-22
down
wechat
bug