当前位置: X-MOL 学术Transportation › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Maximum interpolable gap length in missing smartphone-based GPS mobility data
Transportation ( IF 3.5 ) Pub Date : 2022-09-15 , DOI: 10.1007/s11116-022-10328-2
Danielle McCool , Peter Lugtig , Barry Schouten

Passively-generated location data have the potential to augment mobility and transportation research, as demonstrated by a decade of research. A common trait of these data is a high proportion of missingness. Naïve handling, including list-wise deletion of subjects or days, or linear interpolation across time gaps, has the potential to bias summary results. On the other hand, it is unfeasible to collect mobility data at frequencies high enough to reflect all possible movements. In this paper, we describe the relationship between the temporal and spatial aspects of these data gaps, and illustrate the impact on measures of interest in the field of mobility. We propose a method to deal with missing location data that combines a so-called top-down ratio segmentation method with simple linear interpolation. The linear interpolation imputes missing data. The segmentation method transforms the set of location points to a series of lines, called segments. The method is designed for relatively short gaps, but is evaluated also for longer gaps. We study the effect of our imputation method for the duration of missing data using a completely observed subset of observations from the 2018 Statistics Netherlands travel study. We find that long gaps demonstrate greater downward bias on travel distance, movement events and radius of gyration as compared to shorter but more frequent gaps. When the missingness is unrelated to travel behavior, total sparsity can reach levels of up to 20% with gap lengths of up to 10 min while maintaining a maximum 5% downward bias in the metrics of interest. Temporal aspects can increase these limits; sparsity occurring in the evening or night hours is less biasing due to fewer travel behaviors.



中文翻译:

丢失的基于智能手机的 GPS 移动性数据中的最大可插值间隙长度

正如十年的研究所证明的那样,被动生成的位置数据有可能增强移动性和交通研究。这些数据的一个共同特征是高比例的缺失。幼稚的处理,包括按列表删除主题或日期,或跨时间间隔的线性插值,有可能使汇总结果产生偏差。另一方面,以足够高的频率收集移动数据以反映所有可能的移动是不可行的。在本文中,我们描述了这些数据差距的时间和空间方面之间的关系,并说明了对流动性领域感兴趣的措施的影响。我们提出了一种处理丢失位置数据的方法,该方法将所谓的自顶向下比率分割方法与简单的线性插值相结合。线性插值估算缺失数据。分割方法将一组位置点转换为一系列线,称为线段。该方法设计用于相对较短的间隙,但也适用于较长的间隙。我们使用来自 2018 年荷兰统计局旅行研究的完全观察到的观察子集来研究我们的插补方法对缺失数据持续时间的影响。我们发现,与较短但更频繁的间隙相比,长间隙在行驶距离、运动事件和回转半径方面表现出更大的向下偏差。当缺失与旅行行为无关时,总稀疏度可以达到高达 20% 的水平,间隙长度可达 10 分钟,同时在感兴趣的指标中保持最大 5% 的向下偏差。时间方面会增加这些限制;

更新日期:2022-09-17
down
wechat
bug