Abstract
In this paper, we address the problem of processing reverse top-k queries in a parallel setting. Given a database of objects, a set of user preferences, and a query object q, the reverse top-k query returns the subset of user preferences for which the query object belongs to the top-k results. Although recently the reverse top-k query operator has been studied extensively, its CPU-intensive nature results in prohibitively expensive processing cost, when applied on vast-sized data sets. This limitation motivates us to explore a scalable parallel processing solution, in order to enable reverse top-k processing over distributed large sets of input data in reasonable execution time. We present an algorithmic framework for the problem, in which different algorithms can be instantiated, targeting a generic parallel setting. We describe a parallel algorithm (DiPaRT) that exploits basic pruning properties and is provably correct, as an instantiation of the framework. Furthermore, we introduce novel pruning properties for the problem, and propose DiPaRT+ as another instance of the algorithmic framework, which offers improved efficiency and scales gracefully. All algorithms are implemented in MapReduce, and we provide a wide set of experiments that demonstrate the improved efficiency of DiPaRT+ using data sets that are four orders of magnitude larger than those handled by centralized approaches.
Similar content being viewed by others
Notes
We explicitly state that our work targets offline processing of reverse top-k queries based on all available data objects and user preferences at a given time point.
Source code available at: https://github.com/nikpanos/rtopk.distributed
References
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Rasin, A., Silberschatz, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. PVLDB 2(1), 922–933 (2009)
Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of ICDE, pp. 421–430 (2001)
Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: The HaLoop approach to large-scale iterative data analysis. VLDB J. 21(2), 169–190 (2012)
Candan, K.S., Kim, J.W., Nagarkar, P., Nagendra, M., Yu, R.: RanKloud: scalable multimedia data processing in server clusters. IEEE MultiMedia 18(1), 64–77 (2011)
Chang, Y., Bergman, L.D., Castelli, V., Li, C., Lo, M., Smith, J.R.: The onion technique: Indexing for linear optimization queries. In: Proceedings of the SIGMOD, pp. 391–402 (2000)
Cheema, M.A., Shen, Z., Lin, X., Zhang, W.: A unified framework for efficiently processing ranking related queries. In: Proceedings of EDBT, pp. 427–438 (2014)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Doulkeridis, C., Nørvåg, K.: A survey of large-scale analytical query processing in mapreduce. VLDB J. 23(3), 355–380 (2014)
Doulkeridis, C., Vlachou, A., Mpestas, D., Mamoulis, N.: Parallel and distributed processing of spatial preference queries using keywords. In: Proceedings of EDBT, pp. 318–329 (2017)
Gao, Y., Liu, Q., Chen, G., Zheng, B., Zhou, L.: Answering why-not questions on reverse top-k queries. PVLDB 8(7), 738–749 (2015)
Ge, S., Mamoulis, N., Cheung, D.W.: Efficient all top-k computation: a unified solution for all top-k, reverse top-k and top-m influential queries. IEEE TKDE 25(5), 1015–1027 (2013)
Georgoulas, K., Vlachou, A., Doulkeridis, C., Kotidis, Y.: User-centric similarity search. IEEE Trans. Knowl. Data Eng. 29(1), 200–213 (2017)
Hristidis, V., Koudas, N., Papakonstantinou, Y.: PREFER: a system for the efficient execution of multi-parametric ranked queries. In: Proceedings of SIGMOD, pp. 259–270 (2001)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008)
Kim, W., Kim, Y., Shim, K.: Parallel computation of k-nearest neighbor joins using mapreduce. In: Proceedings of BigData, pp. 696–705 (2016)
Levandoski, J.J., Eldawy, A., Mokbel, M.F., Khalefa, M.E.: Flexible and extensible preference evaluation in database systems. ACM Trans. Database Syst. 38(3), 17:1–17:43 (2013)
Mouratidis, K., Zhang, J., Pang, H.: Maximum rank query. PVLDB 8(12), 1554–1565 (2015)
Nikitopoulos, P., Sfyris, G.A., Vlachou, A., Doulkeridis, C., Telelis, O.: Parallel and distributed processing of reverse top-k queries. In: Proceedings of ICDE, pp. 1586–1589 (2019)
Park, Y., Min, J., Shim, K.: Parallel computation of skyline and reverse skyline queries using MapReduce. PVLDB 6(14), 2002–2013 (2013)
Park, Y., Min, J., Shim, K.: Efficient processing of skyline queries using mapreduce. IEEE Trans. Knowl. Data Eng. 29(5), 1031–1044 (2017)
Rao, S., Ramakrishnan, R., Silberstein, A., Ovsiannikov, M., Reeves, D.: Sailfish: a framework for large scale data processing. In: Proceedings of SOCC, p. 4 (2012)
Rockafellar, R.T.: Convex Analysis. Princeton Landmarks in Mathematics. Princeton University Press, Princeton (1997)
Saouk, M., Doulkeridis, C., Vlachou, A., Nørvåg, K.: Efficient processing of top-k joins in mapreduce. In: Proceedings of BigData, pp. 570–577 (2016)
Tang, B., Mouratidis, K., Yiu, M.L.: Determining the impact regions of competing options in preference space. In: Proceedings of SIGMOD, pp. 805–820 (2017)
Tao, Y., Hristidis, V., Papadias, D., Papakonstantinou, Y.: Branch-and-bound processing of ranked queries. Inf. Syst. 32(3), 424–445 (2007)
Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Reverse top-k queries. In: Proceedings of ICDE, pp. 365–376 (2010)
Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Monochromatic and bichromatic reverse top-k queries. IEEE TKDE 23(8), 1215–1229 (2011)
Vlachou, A., Doulkeridis, C., Nørvåg, K., Kotidis, Y.: Branch-and-bound algorithm for reverse top-k queries. In: Proceedings of SIGMOD, pp. 481–492 (2013)
Yu, A., Agarwal, P.K., Yang, J.: Processing a large number of continuous preference top-k queries. In: Proceedings of SIGMOD, pp. 397–408 (2012)
Zhang, Z., Jin, C., Kang, Q.: Reverse k-ranks query. PVLDB 7(10), 785–796 (2014)
Acknowledgements
This research work has received funding from the Hellenic Foundation for Research and Innovation (HFRI) and the General Secretariat for Research and Technology (GSRT), under Grant Agreement No. 1667 and under the HFRI PhD Fellowship grant (GA. No. 1059), and from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No. 780754. The authors are grateful to Kjetil Nørvåg (NTNU) for providing access to the cluster infrastructure used for the empirical evaluation reported in the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nikitopoulos, P., Sfyris, G.A., Vlachou, A. et al. Pruning techniques for parallel processing of reverse top-k queries. Distrib Parallel Databases 39, 169–199 (2021). https://doi.org/10.1007/s10619-020-07297-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-020-07297-9