当前位置: X-MOL 学术Distrib. Parallel. Databases › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Pruning techniques for parallel processing of reverse top-k queries
Distributed and Parallel Databases ( IF 1.2 ) Pub Date : 2020-05-25 , DOI: 10.1007/s10619-020-07297-9
Panagiotis Nikitopoulos , Georgios A. Sfyris , Akrivi Vlachou , Christos Doulkeridis , Orestis Telelis

In this paper, we address the problem of processing reverse top- k queries in a parallel setting. Given a database of objects, a set of user preferences, and a query object q , the reverse top- k query returns the subset of user preferences for which the query object belongs to the top- k results. Although recently the reverse top- k query operator has been studied extensively, its CPU-intensive nature results in prohibitively expensive processing cost, when applied on vast-sized data sets. This limitation motivates us to explore a scalable parallel processing solution, in order to enable reverse top- k processing over distributed large sets of input data in reasonable execution time. We present an algorithmic framework for the problem, in which different algorithms can be instantiated, targeting a generic parallel setting. We describe a parallel algorithm (DiPaRT) that exploits basic pruning properties and is provably correct, as an instantiation of the framework. Furthermore, we introduce novel pruning properties for the problem, and propose DiPaRT+ as another instance of the algorithmic framework, which offers improved efficiency and scales gracefully. All algorithms are implemented in MapReduce, and we provide a wide set of experiments that demonstrate the improved efficiency of DiPaRT+ using data sets that are four orders of magnitude larger than those handled by centralized approaches.

中文翻译:

并行处理反向top-k查询的修剪技术

在本文中,我们解决了在并行设置中处理反向 top-k 查询的问题。给定一个对象数据库、一组用户偏好和一个查询对象 q,反向 top-k 查询返回查询对象属于 top-k 结果的用户偏好子集。尽管最近反向 top-k 查询运算符已被广泛研究,但当应用于大规模数据集时,其 CPU 密集型性质导致处理成本高得令人望而却步。这种限制促使我们探索可扩展的并行处理解决方案,以便在合理的执行时间内对分布式大输入数据集进行反向 top-k 处理。我们为该问题提出了一个算法框架,其中可以实例化不同的算法,针对通用并行设置。我们描述了一种并行算法 (DiPaRT),它利用了基本的修剪特性,并且可以证明是正确的,作为框架的一个实例。此外,我们为该问题引入了新的修剪特性,并提出 DiPaRT+ 作为算法框架的另一个实例,它提供了更高的效率和优雅的扩展。所有算法都在 MapReduce 中实现,我们提供了一组广泛的实验,使用比集中式方法处理的数据集大四个数量级的数据集,证明了 DiPaRT+ 的效率提高。它提供了更高的效率和优雅的扩展。所有算法都在 MapReduce 中实现,我们提供了一组广泛的实验,使用比集中式方法处理的数据集大四个数量级的数据集,证明了 DiPaRT+ 的效率提高。它提供了更高的效率和优雅的扩展。所有算法都在 MapReduce 中实现,我们提供了一组广泛的实验,使用比集中式方法处理的数据集大四个数量级的数据集,证明了 DiPaRT+ 的效率提高。
更新日期:2020-05-25
down
wechat
bug