当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GeoFlink: A Distributed and Scalable Framework for the Real-time Processing of Spatial Streams
arXiv - CS - Databases Pub Date : 2020-04-07 , DOI: arxiv-2004.03352
Salman Ahmed Shaikh, Komal Mariam, Hiroyuki Kitagawa, Kyoung-Sook Kim

Apache Flink is an open-source system for scalable processing of batch and streaming data. Flink does not natively support efficient processing of spatial data streams, which is a requirement of many applications dealing with spatial data. Besides Flink, other scalable spatial data processing platforms including GeoSpark, Spatial Hadoop, etc. do not support streaming workloads and can only handle static/batch workloads. To fill this gap, we present GeoFlink, which extends Apache Flink to support spatial data types, indexes and continuous queries over spatial data streams. To enable the efficient processing of spatial continuous queries and for the effective data distribution across Flink cluster nodes, a gird-based index is introduced. GeoFlink currently supports spatial range, spatial $k$NN and spatial join queries on point data type. An extensive experimental study on real spatial data streams shows that GeoFlink achieves significantly higher query throughput than ordinary Flink processing.

中文翻译:

GeoFlink:用于实时处理空间流的分布式可扩展框架

Apache Flink 是一个开源系统,用于批量和流数据的可扩展处理。Flink 本身并不支持对空间数据流的高效处理,这是许多处理空间数据的应用程序的要求。除了 Flink,其他可扩展的空间数据处理平台包括 GeoSpark、Spatial Hadoop 等都不支持流式工作负载,只能处理静态/批处理工作负载。为了填补这一空白,我们提出了 GeoFlink,它扩展了 Apache Flink 以支持空间数据类型、索引和对空间数据流的连续查询。为了实现空间连续查询的高效处理和跨 Flink 集群节点的有效数据分布,引入了基于网格的索引。GeoFlink 目前支持点数据类型的空间范围、空间 $k$NN 和空间连接查询。
更新日期:2020-08-04
down
wechat
bug