当前位置: X-MOL 学术GeoInformatica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Generalized communication cost efficient multi-way spatial join: revisiting the curse of the last reducer
GeoInformatica ( IF 2.2 ) Pub Date : 2020-01-14 , DOI: 10.1007/s10707-019-00387-6
S. Nagesh Bhattu , Avinash Potluri , Prashanth Kadari , Subramanyam R. B. V.

With the huge increase in usage of smart mobiles, social media and sensors, large volumes of location-based data is available. Location based data carries important signals pertaining to user intensive information as well as population characteristics. The key analytical tool for location based analysis is multi-way spatial join. Unlike the conventional join strategies, multi-way join using map-reduce offers a scalable, distributed computational paradigm and efficient implementation through communication cost reduction strategies. Controlled Replicate (C-Rep) is a useful strategy used in the literature to perform the multi-way spatial join efficiently. Though C-Rep performance is superior compared to naive sequential join, careful analysis of its performance reveals that such a strategy is plagued by the curse of the last reducer, wherein the skew inherently present in the datasets and the skew introduced by replication operation, causes some of the reducers to take much longer time compared to others. In this work, we design an algorithm GEMS (G eneralized Communication cost E fficient M ulti-Way S patial Join) to address the skewness inherent in the connectivity of spatial objects while performing a multi-way join. We analysed all the algorithms concerned, in terms of I/O and communication costs. We prove that the communication cost of GEMS approach is better than that of C-Rep by a factor O(α) where α is the number of reducers in a single row/column of a grid of reducers. Our experimental results on different datasets indicate that GEMS approach is three times superior(in terms of turn around time) compared to C-Rep.

中文翻译:

通用通信具有成本效益的多路空间连接:重温最后一个减速器的诅咒

随着智能手机,社交媒体和传感器的大量使用,可以使用大量的基于位置的数据。基于位置的数据承载着与用户密集型信息以及人口特征有关的重要信号。基于位置的分析的关键分析工具是多向空间连接。与传统的联接策略不同,使用map-reduce的多路联接通过通信成本降低策略提供了可扩展的分布式计算范式和有效的实现。受控复制(C-Rep)是一种有用的策略,在文献中用于有效执行多路空间连接。尽管C-Rep的性能优于纯自然顺序连接,但对其性能的仔细分析表明,这种策略受到最后一个reducer的诅咒的困扰,其中,数据集中固有的偏斜和复制操作引入的偏斜导致某些化简比其他减速器花费更长的时间。在这项工作中,我们设计了一种算法GEMS(ģ。广义通信成本Ë fficient中号ULTI路小号(ε2)加入),同时执行多路连接,以解决在空间对象的连接性的偏度所固有的。我们从I / O和通信成本方面分析了所有相关算法。我们证明,GEMS方法的通信成本比C-Rep的通信成本好一个因数O(α),其中α是异径管网格的单行/列中异径管的数量。我们在不同数据集上的实验结果表明,与C-Rep相比,GEMS方法(在周转时间方面)优越三倍。
更新日期:2020-01-14
down
wechat
bug