当前位置: X-MOL 学术VLDB J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GeoSparkViz : a cluster computing system for visualizing massive-scale geospatial data
The VLDB Journal ( IF 4.2 ) Pub Date : 2021-01-07 , DOI: 10.1007/s00778-020-00645-2
Jia Yu , Mohamed Sarwat

In the last decade, geospatial data which is extracted from GPS traces and satellites image has become ubiquitous. GeoVisual analytics, abbr. GeoViz, is the science of analytical reasoning assisted by geospatial map interfaces. GeoViz involves two phases: (1) spatial data processing: that loads spatial data and executes spatial queries to return the set of spatial objects to be visualized. (2) Map visualization: that applies a map visualization effect, e.g., Heatmap, on the spatial objects produced in the first phase. Existing GeoViz system architectures decouple these two phases, which lose the opportunity to co-optimize the data processing and map visualization phases in the same cluster. To remedy this, the paper presents GeoSparkViz, a full-fledged system that allows the user to load, process, integrate and execute GeoViz tasks on spatial data at scale. GeoSparkViz extends a state-of-the-art distributed data management system to provide native support for general geospatial map visualization. The system encapsulates the main steps of the map visualization process, e.g., pixelize spatial objects, pixel aggregation, and map tile rendering into a set of massively parallelized map building operators. This allows the system to co-optimize the spatial query operators and map building operators side by side. GeoSparkViz is also equipped with a GeoViz-aware spatial partitioning operator that achieves load balancing for GeoViz workloads among all nodes in the cluster. Experiments based on an implementation in Spark show that GeoSparkViz achieves up to an order of magnitude less data-to-visualization time than its counterparts when running visual analytics tasks over large-scale spatial data extracted from the NYC taxi dataset and OpenStreetMaps.



中文翻译:

GeoSparkViz:用于可视化大规模地理空间数据的集群计算系统

在过去的十年中,从GPS轨迹和卫星图像中提取的地理空间数据变得无处不在。GeoVisual Analytics,缩写。GeoViz是地理空间地图界面辅助的分析推理科学。GeoViz涉及两个阶段:(1)空间数据处理:加载空间数据并执行空间查询以返回要可视化的空间对象集。(2)地图可视化:对第一阶段产生的空间对象应用地图可视化效果(例如,热图)。现有的GeoViz系统架构将这两个阶段解耦,从而失去了在同一集群中共同优化数据处理和地图可视化阶段的机会。为了解决这个问题,本文提出了GeoSparkViz,一个成熟的系统,允许用户按比例加载,处理,集成和执行空间数据上的GeoViz任务。GeoSparkViz扩展了最先进的分布式数据管理系统,可为常规地理空间地图可视化提供本地支持。该系统将地图可视化过程的主要步骤封装在一起,例如将空间对象像素化,像素聚合和地图图块渲染到一组大规模并行化地图构建算子中。这允许系统并排优化空间查询算子和地图构建算子。GeoSparkViz还配备了支持GeoViz的空间分区运算符,该运算符可实现集群中所有节点之间GeoViz工作负载的负载平衡。基于Spark实施的实验表明,当对从NYC出租车数据集和OpenStreetMaps提取的大规模空间数据运行可视化分析任务时,GeoSparkViz所实现的数据到可视化时间比同类产品少多达一个数量级。

更新日期:2021-01-07
down
wechat
bug