A framework for parallel map-matching at scale using Spark,Distributed and Parallel Databases

当前位置： X-MOL 学术 › Distrib. Parallel. Databases › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A framework for parallel map-matching at scale using Spark
Distributed and Parallel Databases ( IF 1.5 ) Pub Date : 2018-11-10 , DOI: 10.1007/s10619-018-7254-0
Douglas Alves Peixoto , Hung Quoc Viet Nguyen , Bolong Zheng , Xiaofang Zhou

Map-matching is a problem of matching recorded GPS trajectories to a digital representation of the road network. GPS data may be inaccurate and heterogeneous, due to limitations or error on electronic sensors, as well as law restrictions. How to accurately match trajectories to the road map is an important preprocessing step for many real-world applications, such as trajectory data mining, traffic analysis, and routes prediction. However, the high availability of GPS trajectories and map data challenges the scalability of current map-matching algorithms, which are limited for small datasets since they focus only on the accuracy of the matching rather than scalability. Therefore, we propose a distributed parallel framework for efficient and scalable offline map-matching on top of the Spark framework. Spark uses distributed in-memory data storage and the MapReduce paradigm to achieve horizontal scaling and fast computation of large datasets. Spark, however, is still limited for dynamic map-matching, and memory consumption in Spark can be an issue for very large datasets. We develop a framework to allow map-matching on top os Spark, while achieving horizontal scalability, memory-wise usage, and maintaining the accuracy of state-of-the-art matching algorithms by: (1) We combine a sampling-based Quadtree spatial partitioning construction and batch-based computation to achieve horizontal scalability of map-matching, as well as reduce cluster memory usage. (2) We employ a safe spatial-boundary approach to preserve matching accuracy of boundary objects. (3) In addition, a cost function for the distributed map-matching workload is provided in order to tune the framework parameters. Our extensive experiments demonstrate that our framework is efficient and scalable to process map-matching on large-scale data, while keeping matching accuracy and low memory usage.

中文翻译：

使用 Spark 进行大规模并行地图匹配的框架

地图匹配是将记录的 GPS 轨迹与道路网络的数字表示相匹配的问题。由于电子传感器的限制或错误以及法律限制，GPS 数据可能不准确和异构。如何准确地将轨迹与路线图匹配是许多实际应用的重要预处理步骤，例如轨迹数据挖掘、交通分析和路线预测。然而，GPS 轨迹和地图数据的高可用性挑战了当前地图匹配算法的可扩展性，这些算法仅限于小数据集，因为它们只关注匹配的准确性而不是可扩展性。因此，我们提出了一个分布式并行框架，用于在 Spark 框架之上进行高效且可扩展的离线地图匹配。Spark 使用分布式内存数据存储和 MapReduce 范式来实现大型数据集的水平扩展和快速计算。然而，Spark 仍然受限于动态地图匹配，对于非常大的数据集，Spark 中的内存消耗可能是一个问题。我们开发了一个框架来允许在顶部 os Spark 上进行地图匹配，同时通过以下方式实现水平可扩展性、内存使用和保持最先进匹配算法的准确性：（1）我们结合了基于采样的四叉树空间分区构建和基于批处理的计算，以实现地图匹配的水平可扩展性，并减少集群内存使用。(2) 我们采用安全的空间边界方法来保持边界对象的匹配精度。(3) 此外，提供分布式地图匹配工作负载的成本函数以调整框架参数。我们的大量实验表明，我们的框架是高效且可扩展的，可以处理大规模数据的地图匹配，同时保持匹配精度和低内存使用率。

更新日期：2018-11-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11