Skip to main content
Log in

Pruning techniques for parallel processing of reverse top-k queries

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

In this paper, we address the problem of processing reverse top-k queries in a parallel setting. Given a database of objects, a set of user preferences, and a query object q, the reverse top-k query returns the subset of user preferences for which the query object belongs to the top-k results. Although recently the reverse top-k query operator has been studied extensively, its CPU-intensive nature results in prohibitively expensive processing cost, when applied on vast-sized data sets. This limitation motivates us to explore a scalable parallel processing solution, in order to enable reverse top-k processing over distributed large sets of input data in reasonable execution time. We present an algorithmic framework for the problem, in which different algorithms can be instantiated, targeting a generic parallel setting. We describe a parallel algorithm (DiPaRT) that exploits basic pruning properties and is provably correct, as an instantiation of the framework. Furthermore, we introduce novel pruning properties for the problem, and propose DiPaRT+ as another instance of the algorithmic framework, which offers improved efficiency and scales gracefully. All algorithms are implemented in MapReduce, and we provide a wide set of experiments that demonstrate the improved efficiency of DiPaRT+ using data sets that are four orders of magnitude larger than those handled by centralized approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. We explicitly state that our work targets offline processing of reverse top-k queries based on all available data objects and user preferences at a given time point.

  2. Source code available at: https://github.com/nikpanos/rtopk.distributed

References

  1. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Rasin, A., Silberschatz, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. PVLDB 2(1), 922–933 (2009)

    Google Scholar 

  2. Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of ICDE, pp. 421–430 (2001)

  3. Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: The HaLoop approach to large-scale iterative data analysis. VLDB J. 21(2), 169–190 (2012)

    Article  Google Scholar 

  4. Candan, K.S., Kim, J.W., Nagarkar, P., Nagendra, M., Yu, R.: RanKloud: scalable multimedia data processing in server clusters. IEEE MultiMedia 18(1), 64–77 (2011)

    Article  Google Scholar 

  5. Chang, Y., Bergman, L.D., Castelli, V., Li, C., Lo, M., Smith, J.R.: The onion technique: Indexing for linear optimization queries. In: Proceedings of the SIGMOD, pp. 391–402 (2000)

  6. Cheema, M.A., Shen, Z., Lin, X., Zhang, W.: A unified framework for efficiently processing ranking related queries. In: Proceedings of EDBT, pp. 427–438 (2014)

  7. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  8. Doulkeridis, C., Nørvåg, K.: A survey of large-scale analytical query processing in mapreduce. VLDB J. 23(3), 355–380 (2014)

    Article  Google Scholar 

  9. Doulkeridis, C., Vlachou, A., Mpestas, D., Mamoulis, N.: Parallel and distributed processing of spatial preference queries using keywords. In: Proceedings of EDBT, pp. 318–329 (2017)

  10. Gao, Y., Liu, Q., Chen, G., Zheng, B., Zhou, L.: Answering why-not questions on reverse top-k queries. PVLDB 8(7), 738–749 (2015)

    Google Scholar 

  11. Ge, S., Mamoulis, N., Cheung, D.W.: Efficient all top-k computation: a unified solution for all top-k, reverse top-k and top-m influential queries. IEEE TKDE 25(5), 1015–1027 (2013)

    Google Scholar 

  12. Georgoulas, K., Vlachou, A., Doulkeridis, C., Kotidis, Y.: User-centric similarity search. IEEE Trans. Knowl. Data Eng. 29(1), 200–213 (2017)

    Article  Google Scholar 

  13. Hristidis, V., Koudas, N., Papakonstantinou, Y.: PREFER: a system for the efficient execution of multi-parametric ranked queries. In: Proceedings of SIGMOD, pp. 259–270 (2001)

  14. Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008)

    Article  Google Scholar 

  15. Kim, W., Kim, Y., Shim, K.: Parallel computation of k-nearest neighbor joins using mapreduce. In: Proceedings of BigData, pp. 696–705 (2016)

  16. Levandoski, J.J., Eldawy, A., Mokbel, M.F., Khalefa, M.E.: Flexible and extensible preference evaluation in database systems. ACM Trans. Database Syst. 38(3), 17:1–17:43 (2013)

    Article  MathSciNet  Google Scholar 

  17. Mouratidis, K., Zhang, J., Pang, H.: Maximum rank query. PVLDB 8(12), 1554–1565 (2015)

    Google Scholar 

  18. Nikitopoulos, P., Sfyris, G.A., Vlachou, A., Doulkeridis, C., Telelis, O.: Parallel and distributed processing of reverse top-k queries. In: Proceedings of ICDE, pp. 1586–1589 (2019)

  19. Park, Y., Min, J., Shim, K.: Parallel computation of skyline and reverse skyline queries using MapReduce. PVLDB 6(14), 2002–2013 (2013)

    Google Scholar 

  20. Park, Y., Min, J., Shim, K.: Efficient processing of skyline queries using mapreduce. IEEE Trans. Knowl. Data Eng. 29(5), 1031–1044 (2017)

    Article  Google Scholar 

  21. Rao, S., Ramakrishnan, R., Silberstein, A., Ovsiannikov, M., Reeves, D.: Sailfish: a framework for large scale data processing. In: Proceedings of SOCC, p. 4 (2012)

  22. Rockafellar, R.T.: Convex Analysis. Princeton Landmarks in Mathematics. Princeton University Press, Princeton (1997)

    MATH  Google Scholar 

  23. Saouk, M., Doulkeridis, C., Vlachou, A., Nørvåg, K.: Efficient processing of top-k joins in mapreduce. In: Proceedings of BigData, pp. 570–577 (2016)

  24. Tang, B., Mouratidis, K., Yiu, M.L.: Determining the impact regions of competing options in preference space. In: Proceedings of SIGMOD, pp. 805–820 (2017)

  25. Tao, Y., Hristidis, V., Papadias, D., Papakonstantinou, Y.: Branch-and-bound processing of ranked queries. Inf. Syst. 32(3), 424–445 (2007)

    Article  Google Scholar 

  26. Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Reverse top-k queries. In: Proceedings of ICDE, pp. 365–376 (2010)

  27. Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Monochromatic and bichromatic reverse top-k queries. IEEE TKDE 23(8), 1215–1229 (2011)

    Google Scholar 

  28. Vlachou, A., Doulkeridis, C., Nørvåg, K., Kotidis, Y.: Branch-and-bound algorithm for reverse top-k queries. In: Proceedings of SIGMOD, pp. 481–492 (2013)

  29. Yu, A., Agarwal, P.K., Yang, J.: Processing a large number of continuous preference top-k queries. In: Proceedings of SIGMOD, pp. 397–408 (2012)

  30. Zhang, Z., Jin, C., Kang, Q.: Reverse k-ranks query. PVLDB 7(10), 785–796 (2014)

    Google Scholar 

Download references

Acknowledgements

This research work has received funding from the Hellenic Foundation for Research and Innovation (HFRI) and the General Secretariat for Research and Technology (GSRT), under Grant Agreement No. 1667 and under the HFRI PhD Fellowship grant (GA. No. 1059), and from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No. 780754. The authors are grateful to Kjetil Nørvåg (NTNU) for providing access to the cluster infrastructure used for the empirical evaluation reported in the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panagiotis Nikitopoulos.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nikitopoulos, P., Sfyris, G.A., Vlachou, A. et al. Pruning techniques for parallel processing of reverse top-k queries. Distrib Parallel Databases 39, 169–199 (2021). https://doi.org/10.1007/s10619-020-07297-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-020-07297-9

Keywords

Navigation