当前位置: X-MOL 学术GeoInformatica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Top- k spatial distance joins
GeoInformatica ( IF 2 ) Pub Date : 2020-02-12 , DOI: 10.1007/s10707-020-00393-z
Shuyao Qi , Panagiotis Bouros , Nikos Mamoulis

Top-k joins have been extensively studied when numerical valued attributes are joined on an equality predicate. Other types of join attributes and predicates have received little to no attention. In this paper, we consider spatial objects that are assigned a score (e.g., a ranking). Give two collections R, S of such objects and a spatial distance threshold 𝜖, we introduce the top-k spatial distance join (k-SDJoin) to identify the k pairs of objects, which have the highest combined score (based on an aggregate function γ) among all object pairs in R × S with a spatial distance at most 𝜖. State-the-of-art methods for relational top-k joins can be adapted for k-SDJoin, but their focus is on minimizing the number of objects accessed from the inputs; however, when spatial objects are joined, the computational cost can easily become the bottleneck. In view of this, we propose a novel evaluation algorithm, which greatly reduces the computational cost, without compromising the access cost. The main idea is to access and efficiently join blocks of objects from each collection, using appropriate bounds to avoid computing the entire spatial 𝜖-distance join. As the performance of our solution heavily relies on the size of the input blocks, we devise an approach for automated block size tuning enhanced by a novel generic model for estimating the number of objects to be accessed from each input. Contrary to previous efforts, our model employs cheap-to-compute statistics and requires no prior knowledge of data distribution. Our extensive experimental analysis demonstrates the efficiency of our algorithm compared to methods based on existing literature that prioritize either the ranking or the spatial join component of k-SDJoin queries.



中文翻译:

Top-k空间距离连接

ķ加入时的数值值属性是在相等谓词接合已被广泛研究。其他类型的联接属性和谓词几乎没有受到关注。在本文中,我们考虑分配了分数(例如,排名)的空间对象。给出此类对象的两个集合RS和一个空间距离阈值𝜖,我们引入前k个空间距离连接(k -SDJoin)来识别k对对象,这些对象对的得分最高(基于聚合函数)R × S中所有对象对中的γ至多一个空间距离ε。关系前k个连接的最新方法可以适用于k -SDJoin,但是它们的重点是最小化从输入中访问的对象数量。但是,当空间对象被合并时,计算成本很容易成为瓶颈。有鉴于此,我们提出了一种新颖的评估算法,该算法在不影响访问成本的情况下,大大降低了计算成本。其主要思想是,以访问和有效地加入对象的块从每个集合,使用适当的范围,以避免计算整个空间ε-远程加入。由于我们解决方案的性能在很大程度上取决于输入块的大小,因此,我们设计了一种自动块大小调整的方法,该方法由新颖的通用模型增强,用于估计每个输入要访问的对象数。与以前的工作相反,我们的模型使用了计算便宜的统计数据,不需要任何数据分发方面的知识。与基于现有文献的方法相比,我们的广泛实验分析证明了我们算法的效率,该方法对k -SDJoin查询的排名或空间合并组件进行优先排序。

更新日期:2020-02-12
down
wechat
bug