Transportmetrica A: Transport Science ( IF 3.3 ) Pub Date : 2020-12-23 , DOI: 10.1080/23249935.2020.1858993 Mohammadali Shirazi 1 , Srinivas Reddy Geedipally 2 , Dominique Lord 3
Crash data are often characterized with numerous zero observations. Sometimes, the number of zero observations is directly correlated with the selected spatial and/or temporal scales for data aggregation. Finding a balance in aggregation is a critical task in data preparation. On the one hand, using the disaggregated data may result in having excessive zero observations, in which the popular negative binomial model may not be adequate for the safety analysis. On the other hand, too much aggregation may result in loss of information. This paper documents a simulation study that aimed at determining criteria for deciding when data aggregation is needed. The simulation study explores the information loss due to aggregation as a function of precision or accuracy in estimation of model coefficients. The simulation results indicate that the reduction in variability, i.e. coefficient of variation, of the independent variables after aggregation is important criteria to decide on the aggregation level.
中文翻译:
用于分析具有过多零观测值的安全数据集的时空聚集的模拟分析
崩溃数据通常具有大量零观测值。有时,零观测值的数量与所选的空间和/或时间尺度直接相关,以进行数据聚合。在聚合中找到平衡是数据准备中的关键任务。一方面,使用分类数据可能会导致观测值过大,其中流行的负二项式模型可能不足以进行安全性分析。另一方面,过多的聚合可能会导致信息丢失。本文记录了一项模拟研究,旨在确定用于确定何时需要数据聚合的标准。仿真研究探讨了由于聚集引起的信息损失,这些损失是模型系数估计中精度或准确性的函数。