Skip to main content
Log in

A scalable solution to the nearest neighbor search problem through local-search methods on neighbor graphs

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Nearest neighbor search is a powerful abstraction for data access; however, data indexing is troublesome even for approximate indexes. For intrinsically high-dimensional data, high-quality fast searches demand either indexes with impractically large memory usage or preprocessing time. In this paper, we introduce an algorithm to solve a nearest-neighbor query q by minimizing a kernel function defined by the distance from q to each object in the database. The minimization is performed using metaheuristics to solve the problem rapidly; even when some methods in the literature use this strategy behind the scenes, our approach is the first one using it explicitly. We also provide two approaches to select edges in the graph’s construction stage that limit memory footprint and reduce the number of free parameters simultaneously. We carry out a thorough experimental comparison with state-of-the-art indexes through synthetic and real-world datasets; we found out that our contributions achieve competitive performances regarding speed, accuracy, and memory in almost any of our benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. The collection was retrieved from http://corpus-texmex.irisa.fr/.

  2. Available at https://github.com/facebookresearch/faiss.

  3. Available at https://github.com/FALCONN-LIB/FALCONN.

  4. https://julialang.org/.

  5. Our source code is available at https://github.com/sadit/SimilaritySearch.jl.

References

  1. Amato G, Esuli A, Falchi F (2015) A comparison of pivot selection techniques for permutation-based indexing. Inf Syst 52:176–188 Special Issue on Selected Papers from SISAP 2013

    Article  Google Scholar 

  2. Amato G, Gennaro C, Savino P (2014) Mi-file: using inverted files for scalable approximate similarity search. Multimed Tools Appl 71(3):1333–1362

    Article  Google Scholar 

  3. Andoni A, Indyk P (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun ACM 51:117–122

    Article  Google Scholar 

  4. Andoni A, Indyk P, Laarhoven T, Razenshteyn I, Schmidt L (2015) Practical and optimal lsh for angular distance. In: Advances in neural information processing systems, pp 1225–1233

  5. Babenko A, Lempitsky V (2016) Efficient indexing of billion-scale datasets of deep descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2055–2063

  6. Chávez E, Graff M, Navarro G, Téllez E (2015) Near neighbor searching with k nearest references. Inf Syst 51:43–61

    Article  Google Scholar 

  7. Chávez E, Navarro G, Baeza-Yates R, Marroquín JL (2001) Searching in metric spaces. ACM Comput Surv 33(3):273–321

    Article  Google Scholar 

  8. Ciaccia P, Patella M, Zezula P (1997) M-tree: An efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB ’97, pp 426–435. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA

  9. Davis JV, Kulis B, Jain P, Sra S, Dhillon IS (2007) Information-theoretic metric learning. In: Proceedings of the 24th International Conference on Machine Learning, ICML’07, pp 209–216. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1273496.1273523

  10. Burke Edmund K, G.K. (2014) Search methodologies: introductory tutorials in optimization and decision support techniques, 2 edn. Springer, US, New York, NY, USA

    MATH  Google Scholar 

  11. Esuli A (2012) Use of permutation prefixes for efficient and scalable approximate similarity search. Inf Process Manag 48(5):889–902

    Article  Google Scholar 

  12. Fu Q, Han X, Liu X, Song J, Deng C (2018) Complementary binary quantization for joint multiple indexing. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI-18, pp 2114–2120. International joint conferences on artificial intelligence organization. https://doi.org/10.24963/ijcai.2018/292

  13. Ge T, He K, Ke Q, Sun J (2014) Optimized product quantization. IEEE Trans Pattern Anal Mach Intell 36(4):744–755

    Article  Google Scholar 

  14. Gionis A, Indyk P, Motwani R (1999) Large Data Bases, VLDB ’99, pp 518–529. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA

  15. Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929

    Article  Google Scholar 

  16. Goyal N, Lifshits Y, Schütze H (2008) Disorder inequality: a combinatorial approach to nearest neighbor search. In: Proceedings of the 2008 international conference on web search and data mining, pp 25–32. ACM (2008)

  17. Heo JP, Lee Y, He J, Chang SF, Yoon SE (2015) Spherical hashing: binary code embedding with hyperspheres. IEEE Trans Pattern Anal Mach Intell 37(11):2304–2316

    Article  Google Scholar 

  18. Houle ME, Nett M (2015) Rank-based similarity search: reducing the dimensional dependence. IEEE Trans Pattern Anal Mach Intell 37(1):136–150

    Article  Google Scholar 

  19. Houle ME, Sakuma J (2005) Fast approximate similarity search in extremely high-dimensional data sets. In: Data Engineering, 2005. ICDE 2005. Proceedings. 21st international conference on, pp 619–630. IEEE

  20. Jégou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128. https://doi.org/10.1109/TPAMI.2010.57

    Article  Google Scholar 

  21. Li Z, Tang J (2015) Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans Multimed 17(11):1989–1999

    Article  Google Scholar 

  22. Liu X, Du B, Deng C, Liu M, Lang B (2015) Structure sensitive hashing with adaptive product quantization. IEEE Trans Cybern 46(10):2252–2264

    Article  Google Scholar 

  23. Liu X, Fu Q, Wang D, Bai X, Wu X, Tao D (2020) Distributed complementary binary quantization for joint hash table learning. In: IEEE transactions on neural networks and learning systems, pp 1–12

  24. Malkov Y, Ponomarenko A, Logvinov A, Krylov V (2012) Scalable distributed algorithm for approximate nearest neighbor search problem in high dimensional general metric spaces. In: Proceedings of the 5th international conference on similarity search and applications (SISAP), pp 132–147

  25. Malkov Y, Ponomarenko A, Logvinov A, Krylov V (2014) Approximate nearest neighbor algorithm based on navigable small world graphs. Inf Syst 45:61–68

    Article  Google Scholar 

  26. Malkov YA, Yashunin DA (2018) Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. In: IEEE transactions on pattern analysis and machine intelligence

  27. Mohamed H, Marchand-Maillet S (2015) Quantized ranking for permutation-based indexing. Inf Syst 52:163–175 Special Issue on Selected Papers from SISAP 2013

    Article  Google Scholar 

  28. Muja M, Lowe D (2014) Scalable nearest neighbor algorithms for high dimensional data. Pattern Anal Mach Intell IEEE Trans 36(11):2227–2240

    Article  Google Scholar 

  29. Naidan B, Hetland ML (2014) Static-to-dynamic transformation for metric indexing structures (extended version). Inf Syst 45:48–60

    Article  Google Scholar 

  30. Navarro G (2002) Searching in metric spaces by spatial approximation. Very Large Databases J VLDBJ 11(1):28–46

    Article  Google Scholar 

  31. Navarro G, Reyes N (2008) Dynamic spatial approximation trees. ACM J Exp Algorithmics 12:1.5:1–1.5:68

    Article  MathSciNet  Google Scholar 

  32. Navarro G, Reyes N (2009) Dynamic spatial approximation trees for massive data. In: Second international workshop on similarity search and applications, SISAP 2009, 29-30 August 2009, Prague, Czech Republic, pp 81–88

  33. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175

    Article  Google Scholar 

  34. Ruiz G, Chávez E, Graff M, Téllez ES (2015) Finding near neighbors through local search. In: Amato G, Connor R, Falchi F, Gennaro C (eds) Similarity search and applications. Springer International Publishing, Cham, pp 103–109

    Chapter  Google Scholar 

  35. Skopal T (2010) Where are you heading, metric access methods?: a provocative survey. In: Proceedings of the third international conference on similarity search and applications, SISAP ’10, pp. 13–21. ACM, New York, NY, USA

  36. Tellez ES, Chavez E, Navarro G (2013) Succinct nearest neighbor search. Inf Syst 38(7):1019–1030

    Article  Google Scholar 

  37. Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244

    MATH  Google Scholar 

  38. Yuan J, Liu X (2016) Fast nearest neighbor search with transformed residual quantization. In: 2016 15th IEEE international conference on machine learning and applications (ICMLA), pp 971–976

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eric S. Tellez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tellez, E.S., Ruiz, G., Chavez, E. et al. A scalable solution to the nearest neighbor search problem through local-search methods on neighbor graphs. Pattern Anal Applic 24, 763–777 (2021). https://doi.org/10.1007/s10044-020-00946-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-020-00946-w

Keywords

Navigation