Skip to main content
Log in

Fast kNN query processing over a multi-node GPU environment

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The kNN (k nearest-neighbors) search is currently applied in a wide range of applications, such as data mining, multimedia, information retrieval, machine learning, pattern recognition, among others. Most of the solutions for this type of search are restricted to metric spaces or limited to use low dimension data. Our proposed algorithm uses as input a set of values (or measures) and returns the K lowest values from that set and can be used with measures obtained from metric and non-metric spaces or also from high dimensional databases. In this work, we introduce a novel GPU-based exhaustive algorithm to solve kNN queries, which is composed of two steps. The first is based on pivots to reduce the range of search, and the second one uses a set of heaps as auxiliary structures to return the final results. We also extended our algorithm to be able to use a multi-GPU platform and a multi-node/multi-GPU platform. To the best of our knowledge, taking account of the state-of-the-art technical literature, this work uses the most extensive database (in terms of data amount) to process a kNN query using up to 13,189 million of elements and achieving a speed-up up to 1843× when using a 5-nodes/20-GPUs platform.

This is a preview of subscription content, log in via an institution to check access.

Access this article

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. OpenCL uses different names for the same constructs.

References

  1. Adeniyi D, Wei Z, Yongquan Y (2014) Automated web usage data mining and recommendation system using k-nearest neighbor (knn) classification method. App Comput Inform. https://doi.org/10.1016/j.aci.2014.10.001

    Article  Google Scholar 

  2. Adeniyi DA, Wei Z, Yongquan Y (2016) Automated web usage data mining and recommendation system using k-nearest neighbor (knn) classification method. Appl Comput Inform 12(1):90–108

    Article  Google Scholar 

  3. Aha DW, Kibler D, Albert M (1991) Instance-based learning algorithms. Springer, New York, pp 37–66

    Google Scholar 

  4. AL-Nabi DA, Ahmed SS, (2013) Survey on classification algorithms for data mining: comparison and evaluation. Comput Eng Intell Syst 4(8):18–24

    Google Scholar 

  5. Amorim LA, Freitas MF, da Silva PH, Martins WS (2018) A fast similarity search knn for textual datasets. In: 2018 Symposium on High Performance Computing Systems (WSCAD). IEEE, pp 229–236

  6. Archana S, Elangovan K (2014) Survey of classification techniques in data mining. Int J Comput Sci Mobile Appl 2(2):65–71

    Google Scholar 

  7. Bajramovic F, Mattern F, Butko N, Denzler J (2006) A comparison of nearest neighbor search algorithms for generic object recognition. In: Blanc-Talon J, Philips W, Popescu D, Scheunders P (eds) Advanced concepts for intelligent vision systems. Springer, Berlin, pp 1186–1197

    Chapter  Google Scholar 

  8. Barrientos R, Gómez J, Tenllado C, Prieto M, Marin M (2011) kNN query processing in metric spaces using GPUs. In: 17th International European Conference on Parallel and Distributed Computing (Euro-Par 2011), pp 380–392

  9. Barrientos RJ, Millaguir F, Sánchez JL, Arias E (2017) GPU-based exhaustive algorithms processing knn queries. J Supercomput 73(10):4611–4634

    Article  Google Scholar 

  10. Beliakov G, Johnstone M, Nahavandi S (2012) Computing of high breakdown regression estimators without sorting on graphics processing units. Computing 94(5):433–447. https://doi.org/10.1007/s00607-011-0183-7

    Article  MathSciNet  MATH  Google Scholar 

  11. Beliakov G, Li G (2012) Improving the speed and stability of the k-nearest neighbors method. Pattern Recogn Lett 33(10):1296–1301. https://doi.org/10.1016/j.patrec.2012.02.016

    Article  Google Scholar 

  12. Bhatia N (2010) Vandana: survey of nearest neighbor techniques. Int J Comput Sci Inform Secur 8(2)

  13. Brisaboa NR, Fariña A, Pedreira O, Reyes N (2006) Similarity search using sparse pivots for efficient multimedia information retrieval. In: ISM, pp 881–888

  14. Cai Y, See S (2016) GPU computing and applications. Springer, New York

    Google Scholar 

  15. Cardie C, Nowe N (1997) Improving minority class prediction using case-specific feature weights. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICM-97. Morgan Kaufmann Publishers Inc., San Francisco, pp 57–65

  16. Cayton L (2012) Accelerating nearest neighbor search on manycore systems. In: Parallel Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, pp 402–413. https://doi.org/10.1109/IPDPS.2012.45

  17. Chapman B, Jost G, Pas RVD (2008) Using OpenMP: portable shared memory parallel programming. The MIT Press, Cambridge

    Google Scholar 

  18. Chávez E, Navarro G (2005) A compact space decomposition for effective metric indexing. Pattern Recogn Lett 26(9):1363–1376

    Article  Google Scholar 

  19. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inform Theory 13(1):21–27

    Article  Google Scholar 

  20. CUDA: Compute Unified Device Architecture. 2007 NVIDIA Corporation. http://developer.nvidia.com/object/cuda.html

  21. CUB Library v1.7.0. http://nvlabs.github.io/cub/index.html

  22. Dashti A, Komarov I, D’Souza RM (2013) Efficient computation of k-nearest neighbour graphs for large high-dimensional data sets on gpu clusters. PLoS ONE 8(9):1–12. https://doi.org/10.1371/journal.pone.0074113

    Article  Google Scholar 

  23. Deng Z, Zhu X, Cheng D, Zong M, Zhang S (2016) Efficient knn classification algorithm for big data. Neurocomputing 195:143–148

    Article  Google Scholar 

  24. Deole PA, Longadge R (2014) Content based image retrieval using color feature extraction with knn classification. IJCSMC 3(5):1274–1280

    Google Scholar 

  25. Elnahrawy E (2002) Log-based chat room monitoring using text categorization: a comparative study

  26. Garcia V, Debreuve E, Barlaud M (2008) Fast k nearest neighbor search using GPU. In: Computer Vision and Pattern Recognition Workshop, pp 1–6

  27. GPU Computing. http://www.nvidia.com/object/what-is-gpu-computing.html

  28. García-Pedrajas N, del Castillo JAR, Cerruela-García G (2017) A proposal for local k values for k-nearest neighbor rule. IEEE Trans Neural Netw Learn Syst 28(2):470–475

    Article  Google Scholar 

  29. Geng X, Liu TY, Qin T, Arnold A, Li H, Shum HY (2008) Query dependent ranking using k-nearest neighbor, pp 115–122. 10.1145/1390334.1390356

  30. Geng X, Liu TY, Qin T, Arnold A, Li H, Shum HY (2008) Query dependent ranking using k-nearest neighbor. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’08. Association for Computing Machinery, New York, pp 115–122. 10.1145/1390334.1390356

  31. Kalakuntla P (2017) Performance analysis of knn query processing on large datasets using cuda & pthreads comparing between cpu & gpu. Ph.D. thesis. 10.13140/RG.2.2.30376.88326

  32. Keogh E, Mueen A (2010) Curse of dimensionality. In: Encyclopedia of machine learning. Springer, pp 257–258. 10.1007/978-0-387-30164-8\_192

  33. Klusek A, Dzwinel W (2018) Multi-gpu k-nearest neighbor search in the context of data embedding. Adv Parallel Comput 32:359–368

    Google Scholar 

  34. Krulis M, Skopal T, Lokoc J, Beecks C (2012) Combining CPU and GPU architectures for fast similarity search. Distrib Parallel Databases 30(3–4):179–207. https://doi.org/10.1007/s10619-012-7092-4

    Article  Google Scholar 

  35. Kuang Q, Zhao L (2009) A practical GPU based knn algorithm. Huangshan, China, pp 151–155

  36. Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  37. Liu Y, Zhang D, Lu G, Ma WY (2007) A survey of content-based image retrieval with high-level semantics. Pattern Recogn 40(1):262–282

    Article  Google Scholar 

  38. Ma H, Gou J, Ou W, Zeng S, Rao Y, Yang H (2017) A new nearest neighbor classifier based on multi-harmonic mean distances. In: 2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), pp 31–36

  39. Mic V, Novak D, Zezula P (2016) Speeding up similarity search by sketches. Springer, Cham, pp 250–258. 10.1007/978-3-319-46759-7\_19

  40. Myhre JN, Mikalsen KØ, Løkse S, Jenssen R (2018) Robust clustering using a knn mode seeking ensemble. Pattern Recogn 76:491–505

    Article  Google Scholar 

  41. Navarro CA, Hitschfeld-Kahler N, Mateu L (2014) A survey on parallel computing and its applications in data-parallel problems using GPU architectures. Commun Comput Phys 15(2):285–329

    Article  MathSciNet  Google Scholar 

  42. Navarro G, Uribe-Paredes R (2011) Fully dynamic metric access methods based on hyperplane partitioning. Inform Syst 36(4):734–747. https://doi.org/10.1016/j.is.2011.01.002

    Article  Google Scholar 

  43. Novak D, Batko M, Zezula P (2011) Metric index: an efficient and scalable solution for precise and approximate similarity search. Inform Syst 36(4):721–733

    Article  Google Scholar 

  44. NVIDIA T (2017) V100 GPU architecture

  45. NVIDIA Corporation (2015) CUDA C Best Practices Guide, 7.5 edn

  46. Pan J, Manocha D (2011) Fast GPU-based locality sensitive hashing for k-nearest neighbor computation. In: Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS ’11. ACM, New York, pp 211–220. 10.1145/2093973.2094002

  47. Pan Z, Wang Y, Pan Y (2020) A new locally adaptive k-nearest neighbor algorithm based on discrimination class. Knowledge-Based Syst 204:106–185. https://doi.org/10.1016/j.knosys.2020.106185

    Article  Google Scholar 

  48. Romsaiyud W, Schnoor H, Hasselbring W (2019) Improving k-nearest neighbor pattern recognition models for privacy-preserving data analysis. In: 2019 IEEE International Conference on Big Data (Big Data), pp 5804–5813

  49. Schäfer M (2018) The fourth industrial revolution: how the EU can lead it. Eur View 17(1):5–12. https://doi.org/10.1177/1781685818762890

    Article  Google Scholar 

  50. Skryjomski P, Krawczyk B, Cano A (2019) Speeding up k-nearest neighbors classifier for large-scale multi-label learning on GPUs. Neurocomputing 354:10–19

    Article  Google Scholar 

  51. Tang X, Huang Z, Eyers D, Mills S, Guo M (2015) Efficient selection algorithm for fast k-NN search on GPUs. In: 2015 IEEE International Parallel and Distributed Processing Symposium. IEEE, pp 397–406

  52. Toker G, Kirmemis O (2013) Text categorization using k nearest neighbor classification. Middle East Technical University, Survey Paper

  53. Tesla C2050/C2070 GPU Computing Processor. http://www.nvidia.co.uk/object/product_tesla_C2050_C2070_uk.html

  54. Tesla M2050/M2070 GPU Computing Processor. http://www.nvidia.co.uk/object/product_tesla_M2050_M2070_uk.html

  55. Vaidehi V (2008) Person authentication using face recognition. In: Proceedings of World Congress on Engineering and Computer Science. https://ci.nii.ac.jp/naid/20000817879/en/

  56. Watad A, Libov A, Shacham O, Bortnikov E, Silberstein M (2019) Achieving scalability in a k-NN multi-GPU network service with centaur. In: 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, pp 245–257

  57. Xua S, Wub Y (2008) An algorithm for remote sensing image classification based on artificial immune b-cell network

  58. Yang Y, Ault T, Pierce T, Lattimer CW (2000) Improving text categorization methods for event tracking. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2000. Association for Computing Machinery, New York, pp 65–72. 10.1145/345508.345550

  59. Zhang H, Kiranyaz S, Gabbouj M (2017) A k-nearest neighbor multilabel ranking algorithm with application to content-based image retrieval. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 2587–2591

Download references

Acknowledgements

This research was funded by project FONDECYT REGULAR 2020 No 1200810 “Very Large Fingerprint Classification based on a Fast and Distributed Extreme Learning Machine”, and project FONDECYT DE INICIACIÓN No 11180881. Both projects are from Agencia Nacional de Investigación y Desarrollo, Ministerio de Ciencia, Tecnología, Conocimiento e Innovación, Gobierno de Chile.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo J. Barrientos.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Barrientos, R.J., Riquelme, J.A., Hernández-García, R. et al. Fast kNN query processing over a multi-node GPU environment. J Supercomput 78, 3045–3071 (2022). https://doi.org/10.1007/s11227-021-03975-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-03975-2

Keywords

Navigation