Pruning techniques for parallel processing of reverse top-k queries

Nikitopoulos, Panagiotis; Sfyris, Georgios A.; Vlachou, Akrivi; Doulkeridis, Christos; Telelis, Orestis

doi:10.1007/s10619-020-07297-9

Pruning techniques for parallel processing of reverse top-k queries

Published: 25 May 2020

Volume 39, pages 169–199, (2021)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Panagiotis Nikitopoulos¹,
Georgios A. Sfyris¹,
Akrivi Vlachou¹,
Christos Doulkeridis¹ &
…
Orestis Telelis¹

266 Accesses
1 Citation
Explore all metrics

Abstract

In this paper, we address the problem of processing reverse top-k queries in a parallel setting. Given a database of objects, a set of user preferences, and a query object q, the reverse top-k query returns the subset of user preferences for which the query object belongs to the top-k results. Although recently the reverse top-k query operator has been studied extensively, its CPU-intensive nature results in prohibitively expensive processing cost, when applied on vast-sized data sets. This limitation motivates us to explore a scalable parallel processing solution, in order to enable reverse top-k processing over distributed large sets of input data in reasonable execution time. We present an algorithmic framework for the problem, in which different algorithms can be instantiated, targeting a generic parallel setting. We describe a parallel algorithm (DiPaRT) that exploits basic pruning properties and is provably correct, as an instantiation of the framework. Furthermore, we introduce novel pruning properties for the problem, and propose DiPaRT+ as another instance of the algorithmic framework, which offers improved efficiency and scales gracefully. All algorithms are implemented in MapReduce, and we provide a wide set of experiments that demonstrate the improved efficiency of DiPaRT+ using data sets that are four orders of magnitude larger than those handled by centralized approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data analytics on Apache Spark

Article 13 October 2016

A survey on the evolution of stream processing systems

Article Open access 22 November 2023

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

Article 15 February 2022

Notes

We explicitly state that our work targets offline processing of reverse top-k queries based on all available data objects and user preferences at a given time point.
Source code available at: https://github.com/nikpanos/rtopk.distributed

References

Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Rasin, A., Silberschatz, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. PVLDB 2(1), 922–933 (2009)
Google Scholar
Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of ICDE, pp. 421–430 (2001)
Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: The HaLoop approach to large-scale iterative data analysis. VLDB J. 21(2), 169–190 (2012)
Article Google Scholar
Candan, K.S., Kim, J.W., Nagarkar, P., Nagendra, M., Yu, R.: RanKloud: scalable multimedia data processing in server clusters. IEEE MultiMedia 18(1), 64–77 (2011)
Article Google Scholar
Chang, Y., Bergman, L.D., Castelli, V., Li, C., Lo, M., Smith, J.R.: The onion technique: Indexing for linear optimization queries. In: Proceedings of the SIGMOD, pp. 391–402 (2000)
Cheema, M.A., Shen, Z., Lin, X., Zhang, W.: A unified framework for efficiently processing ranking related queries. In: Proceedings of EDBT, pp. 427–438 (2014)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Doulkeridis, C., Nørvåg, K.: A survey of large-scale analytical query processing in mapreduce. VLDB J. 23(3), 355–380 (2014)
Article Google Scholar
Doulkeridis, C., Vlachou, A., Mpestas, D., Mamoulis, N.: Parallel and distributed processing of spatial preference queries using keywords. In: Proceedings of EDBT, pp. 318–329 (2017)
Gao, Y., Liu, Q., Chen, G., Zheng, B., Zhou, L.: Answering why-not questions on reverse top-k queries. PVLDB 8(7), 738–749 (2015)
Google Scholar
Ge, S., Mamoulis, N., Cheung, D.W.: Efficient all top-k computation: a unified solution for all top-k, reverse top-k and top-m influential queries. IEEE TKDE 25(5), 1015–1027 (2013)
Google Scholar
Georgoulas, K., Vlachou, A., Doulkeridis, C., Kotidis, Y.: User-centric similarity search. IEEE Trans. Knowl. Data Eng. 29(1), 200–213 (2017)
Article Google Scholar
Hristidis, V., Koudas, N., Papakonstantinou, Y.: PREFER: a system for the efficient execution of multi-parametric ranked queries. In: Proceedings of SIGMOD, pp. 259–270 (2001)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008)
Article Google Scholar
Kim, W., Kim, Y., Shim, K.: Parallel computation of k-nearest neighbor joins using mapreduce. In: Proceedings of BigData, pp. 696–705 (2016)
Levandoski, J.J., Eldawy, A., Mokbel, M.F., Khalefa, M.E.: Flexible and extensible preference evaluation in database systems. ACM Trans. Database Syst. 38(3), 17:1–17:43 (2013)
Article MathSciNet Google Scholar
Mouratidis, K., Zhang, J., Pang, H.: Maximum rank query. PVLDB 8(12), 1554–1565 (2015)
Google Scholar
Nikitopoulos, P., Sfyris, G.A., Vlachou, A., Doulkeridis, C., Telelis, O.: Parallel and distributed processing of reverse top-k queries. In: Proceedings of ICDE, pp. 1586–1589 (2019)
Park, Y., Min, J., Shim, K.: Parallel computation of skyline and reverse skyline queries using MapReduce. PVLDB 6(14), 2002–2013 (2013)
Google Scholar
Park, Y., Min, J., Shim, K.: Efficient processing of skyline queries using mapreduce. IEEE Trans. Knowl. Data Eng. 29(5), 1031–1044 (2017)
Article Google Scholar
Rao, S., Ramakrishnan, R., Silberstein, A., Ovsiannikov, M., Reeves, D.: Sailfish: a framework for large scale data processing. In: Proceedings of SOCC, p. 4 (2012)
Rockafellar, R.T.: Convex Analysis. Princeton Landmarks in Mathematics. Princeton University Press, Princeton (1997)
MATH Google Scholar
Saouk, M., Doulkeridis, C., Vlachou, A., Nørvåg, K.: Efficient processing of top-k joins in mapreduce. In: Proceedings of BigData, pp. 570–577 (2016)
Tang, B., Mouratidis, K., Yiu, M.L.: Determining the impact regions of competing options in preference space. In: Proceedings of SIGMOD, pp. 805–820 (2017)
Tao, Y., Hristidis, V., Papadias, D., Papakonstantinou, Y.: Branch-and-bound processing of ranked queries. Inf. Syst. 32(3), 424–445 (2007)
Article Google Scholar
Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Reverse top-k queries. In: Proceedings of ICDE, pp. 365–376 (2010)
Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Monochromatic and bichromatic reverse top-k queries. IEEE TKDE 23(8), 1215–1229 (2011)
Google Scholar
Vlachou, A., Doulkeridis, C., Nørvåg, K., Kotidis, Y.: Branch-and-bound algorithm for reverse top-k queries. In: Proceedings of SIGMOD, pp. 481–492 (2013)
Yu, A., Agarwal, P.K., Yang, J.: Processing a large number of continuous preference top-k queries. In: Proceedings of SIGMOD, pp. 397–408 (2012)
Zhang, Z., Jin, C., Kang, Q.: Reverse k-ranks query. PVLDB 7(10), 785–796 (2014)
Google Scholar

Download references

Acknowledgements

This research work has received funding from the Hellenic Foundation for Research and Innovation (HFRI) and the General Secretariat for Research and Technology (GSRT), under Grant Agreement No. 1667 and under the HFRI PhD Fellowship grant (GA. No. 1059), and from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No. 780754. The authors are grateful to Kjetil Nørvåg (NTNU) for providing access to the cluster infrastructure used for the empirical evaluation reported in the paper.

Author information

Authors and Affiliations

Department of Digital Systems, School of Information and Communication Technologies, University of Piraeus, 185 34, Piraeus, Greece
Panagiotis Nikitopoulos, Georgios A. Sfyris, Akrivi Vlachou, Christos Doulkeridis & Orestis Telelis

Authors

Panagiotis Nikitopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Georgios A. Sfyris
View author publications
You can also search for this author in PubMed Google Scholar
Akrivi Vlachou
View author publications
You can also search for this author in PubMed Google Scholar
Christos Doulkeridis
View author publications
You can also search for this author in PubMed Google Scholar
Orestis Telelis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Panagiotis Nikitopoulos.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nikitopoulos, P., Sfyris, G.A., Vlachou, A. et al. Pruning techniques for parallel processing of reverse top-k queries. Distrib Parallel Databases 39, 169–199 (2021). https://doi.org/10.1007/s10619-020-07297-9

Download citation

Published: 25 May 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s10619-020-07297-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pruning techniques for parallel processing of reverse top-k queries

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

A survey on the evolution of stream processing systems

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pruning techniques for parallel processing of reverse top-k queries

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

A survey on the evolution of stream processing systems

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation