HadoopTrajectory: a Hadoop spatiotemporal data processing extension,Journal of Geographical Systems

当前位置： X-MOL 学术 › J. Geogr. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

HadoopTrajectory: a Hadoop spatiotemporal data processing extension
Journal of Geographical Systems ( IF 2.417 ) Pub Date : 2019-02-01 , DOI: 10.1007/s10109-019-00292-4
Mohamed Bakli , Mahmoud Sakr , Taysir Hassan A. Soliman

The recent advances in location tracking technologies and the widespread use of location-aware applications have resulted in big datasets of moving object trajectories. While there exists a couple of research prototypes for moving object databases, there is a lack of systems that can process big spatiotemporal data. This work proposes HadoopTrajectory, a Hadoop extension for spatiotemporal data processing. The extension adds spatiotemporal types and operators to the Hadoop core. These types and operators can be directly used in MapReduce programs, which gives the Hadoop user the possibility to write spatiotemporal data analytics programs. The storage layer of Hadoop, the HDFS, is extended by types to represent trajectory data and their corresponding input and output functions. It is also extended by file splitters and record readers. This enables Hadoop to read big files of moving object trajectories such as vehicle GPS tracks and split them over worker nodes for distributed processing. The storage layer is also extended by spatiotemporal indexes that help filtering the data before splitting it over the worker nodes. Several data access functions are provided so that the MapReduce layer can deal with this data. The MapReduce layer is extended with trajectory processing operators, to compute for instance the length of a trajectory in meters. This paper describes the extension and evaluates it using a synthetic dataset and a real dataset. Comparisons with non-Hadoop systems and with standard Hadoop are given. The extension accounts for about 11,601 lines of Java code.

中文翻译：

HadoopTrajectory：Hadoop时空数据处理扩展

位置跟踪技术的最新进展以及位置感知应用程序的广泛使用已导致移动对象轨迹的大型数据集。尽管存在一些用于移动对象数据库的研究原型，但缺少可以处理大时空数据的系统。这项工作提出了Hadoop轨迹，这是用于时空数据处理的Hadoop扩展。该扩展将时空类型和运算符添加到Hadoop核心。这些类型和运算符可以直接在MapReduce程序中使用，这使Hadoop用户可以编写时空数据分析程序。Hadoop的存储层HDFS通过类型扩展，以表示轨迹数据及其相应的输入和输出功能。文件拆分器和记录读取器也对其进行了扩展。这使Hadoop能够读取运动对象轨迹的大文件，例如车辆GPS轨迹，并将其拆分到工作人员节点上以进行分布式处理。时空索引还扩展了存储层，该时空索引有助于在通过工作节点将数据拆分之前对数据进行过滤。提供了几种数据访问功能，以便MapReduce层可以处理此数据。MapReduce层使用轨迹处理运算符进行了扩展，以计算例如以米为单位的轨迹长度。本文介绍了扩展，并使用综合数据集和真实数据集对其进行了评估。给出了与非Hadoop系统和标准Hadoop的比较。该扩展程序占用了大约11,601行Java代码。

更新日期：2019-02-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>