当前位置: X-MOL 学术GeoInformatica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distributed mining of convoys in large scale datasets
GeoInformatica ( IF 2.2 ) Pub Date : 2021-02-24 , DOI: 10.1007/s10707-020-00431-w
Faisal Orakzai , Torben Bach Pedersen , Toon Calders

Tremendous increase in the use of the mobile devices equipped with the GPS and other location sensors has resulted in the generation of a huge amount of movement data. In recent years, mining this data to understand the collective mobility behavior of humans, animals and other objects has become popular. Numerous mobility patterns, or their mining algorithms have been proposed, each representing a specific movement behavior. Convoy pattern is one such pattern which can be used to find groups of people moving together in public transport or to prevent traffic jams. A convoy is a set of at least m objects moving together for at least k consecutive time stamps where m and k are user-defined parameters. Existing algorithms for detecting convoy patterns do not scale to real-life dataset sizes. Therefore in this paper, we propose a generic distributed convoy pattern mining algorithm called DCM and show how such an algorithm can be implemented using the MapReduce framework. We present a cost model for DCM and a detailed theoretical analysis backed by experimental results. We show the effect of partition size on the performance of DCM. The results from our experiments on different data-sets and hardware setups, show that our distributed algorithm is scalable in terms of data size and number of nodes, and more efficient than any existing sequential as well as distributed convoy pattern mining algorithm, showing speed-ups of up to 16 times over SPARE, the state of the art distributed co-movement pattern mining framework. DCM is thus able to process large datasets which SPARE is unable to.



中文翻译:

大规模数据集中车队的分布式挖掘

配备GPS和其他位置传感器的移动设备的使用量激增,导致产生了大量的运动数据。近年来,挖掘此数据以了解人类,动物和其他物体的集体流动行为已变得很流行。已经提出了许多移动性模式,或者它们的挖掘算法,每个都代表一种特定的运动行为。车队模式就是一种这样的模式,可用于查找在公共交通工具中一起移动的人群或防止交通拥堵。车队是一组至少m个对象,它们一起移动至少k个连续时间戳,其中mk是用户定义的参数。现有的用于检测车队模式的算法无法适应实际数据集的大小。因此,在本文中,我们提出了一种称为DCM的通用分布式车队模式挖掘算法,并展示了如何使用MapReduce框架实现这种算法。我们提供了DCM的费用模型以及以实验结果为依据的详细理论分析。我们展示了分区大小对DCM性能的影响。我们在不同数据集和硬件设置上进行的实验结果表明,我们的分布式算法在数据大小和节点数方面具有可扩展性,并且比任何现有的顺序算法和分布式车队模式挖掘算法都更高效,显示出最先进的分布式协同运动模式挖掘框架SPARE的性能提高了16倍。

更新日期:2021-02-25
down
wechat
bug