当前位置: X-MOL 学术GeoInformatica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cost estimation of spatial join in spatialhadoop
GeoInformatica ( IF 2.2 ) Pub Date : 2020-07-05 , DOI: 10.1007/s10707-020-00414-x
A. Belussi , S. Migliorini , A. Eldawy

Spatial join is an important operation in geo-spatial applications, since it is frequently used for performing data analysis involving geographical information. Many efforts have been done in the past decades in order to provide efficient algorithms for spatial join and this becomes particularly important as the amount of spatial data to be processed increases. In recent years, the MapReduce approach has become a de-facto standard for processing large amount of data (big-data) and some attempts have been made for extending existing frameworks for the processing of spatial data. In this context, several different MapReduce implementations of spatial join have been defined which mainly differ in the use of a spatial index and in the way this index is built and used. In general, none of these algorithms can be considered better than the others, but the choice might depend on the characteristics of the involved datasets. The aim of this work is to deeply analyse them and define a cost model for ranking them based on the characteristics of the dataset at hand (i.e., selectivity or spatial properties). This cost model has been extensively tested w.r.t. a set of synthetic datasets in order to prove its effectiveness.



中文翻译:

空间hadoop中空间连接的成本估算

空间联接是地理空间应用程序中的一项重要操作,因为它经常用于执行涉及地理信息的数据分析。在过去的几十年中,为了提供有效的空间连接算法已经做了很多努力,随着要处理的空间数据量的增加,这一点变得尤为重要。近年来,MapReduce方法已成为处理大量数据(大数据)的事实上的标准,并且已经进行了一些尝试来扩展用于处理空间数据的现有框架。在这种情况下,已经定义了几种不同的空间连接MapReduce实现,它们的主要区别在于空间索引的使用以及该索引的构建和使用方式。通常,这些算法都不能被认为比其他算法更好,但是选择可能取决于所涉及数据集的特征。这项工作的目的是对它们进行深入分析,并根据手头数据集的特征(即选择性或空间特性)定义一种成本模型,对它们进行排名。为了证明其有效性,该成本模型已通过一组综合数据集进行了广泛测试。

更新日期:2020-07-05
down
wechat
bug