Abstract
Nearest neighbor search is a powerful abstraction for data access; however, data indexing is troublesome even for approximate indexes. For intrinsically high-dimensional data, high-quality fast searches demand either indexes with impractically large memory usage or preprocessing time. In this paper, we introduce an algorithm to solve a nearest-neighbor query q by minimizing a kernel function defined by the distance from q to each object in the database. The minimization is performed using metaheuristics to solve the problem rapidly; even when some methods in the literature use this strategy behind the scenes, our approach is the first one using it explicitly. We also provide two approaches to select edges in the graph’s construction stage that limit memory footprint and reduce the number of free parameters simultaneously. We carry out a thorough experimental comparison with state-of-the-art indexes through synthetic and real-world datasets; we found out that our contributions achieve competitive performances regarding speed, accuracy, and memory in almost any of our benchmarks.
Similar content being viewed by others
Notes
The collection was retrieved from http://corpus-texmex.irisa.fr/.
Available at https://github.com/facebookresearch/faiss.
Available at https://github.com/FALCONN-LIB/FALCONN.
Our source code is available at https://github.com/sadit/SimilaritySearch.jl.
References
Amato G, Esuli A, Falchi F (2015) A comparison of pivot selection techniques for permutation-based indexing. Inf Syst 52:176–188 Special Issue on Selected Papers from SISAP 2013
Amato G, Gennaro C, Savino P (2014) Mi-file: using inverted files for scalable approximate similarity search. Multimed Tools Appl 71(3):1333–1362
Andoni A, Indyk P (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun ACM 51:117–122
Andoni A, Indyk P, Laarhoven T, Razenshteyn I, Schmidt L (2015) Practical and optimal lsh for angular distance. In: Advances in neural information processing systems, pp 1225–1233
Babenko A, Lempitsky V (2016) Efficient indexing of billion-scale datasets of deep descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2055–2063
Chávez E, Graff M, Navarro G, Téllez E (2015) Near neighbor searching with k nearest references. Inf Syst 51:43–61
Chávez E, Navarro G, Baeza-Yates R, Marroquín JL (2001) Searching in metric spaces. ACM Comput Surv 33(3):273–321
Ciaccia P, Patella M, Zezula P (1997) M-tree: An efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB ’97, pp 426–435. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA
Davis JV, Kulis B, Jain P, Sra S, Dhillon IS (2007) Information-theoretic metric learning. In: Proceedings of the 24th International Conference on Machine Learning, ICML’07, pp 209–216. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1273496.1273523
Burke Edmund K, G.K. (2014) Search methodologies: introductory tutorials in optimization and decision support techniques, 2 edn. Springer, US, New York, NY, USA
Esuli A (2012) Use of permutation prefixes for efficient and scalable approximate similarity search. Inf Process Manag 48(5):889–902
Fu Q, Han X, Liu X, Song J, Deng C (2018) Complementary binary quantization for joint multiple indexing. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI-18, pp 2114–2120. International joint conferences on artificial intelligence organization. https://doi.org/10.24963/ijcai.2018/292
Ge T, He K, Ke Q, Sun J (2014) Optimized product quantization. IEEE Trans Pattern Anal Mach Intell 36(4):744–755
Gionis A, Indyk P, Motwani R (1999) Large Data Bases, VLDB ’99, pp 518–529. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA
Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929
Goyal N, Lifshits Y, Schütze H (2008) Disorder inequality: a combinatorial approach to nearest neighbor search. In: Proceedings of the 2008 international conference on web search and data mining, pp 25–32. ACM (2008)
Heo JP, Lee Y, He J, Chang SF, Yoon SE (2015) Spherical hashing: binary code embedding with hyperspheres. IEEE Trans Pattern Anal Mach Intell 37(11):2304–2316
Houle ME, Nett M (2015) Rank-based similarity search: reducing the dimensional dependence. IEEE Trans Pattern Anal Mach Intell 37(1):136–150
Houle ME, Sakuma J (2005) Fast approximate similarity search in extremely high-dimensional data sets. In: Data Engineering, 2005. ICDE 2005. Proceedings. 21st international conference on, pp 619–630. IEEE
Jégou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128. https://doi.org/10.1109/TPAMI.2010.57
Li Z, Tang J (2015) Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans Multimed 17(11):1989–1999
Liu X, Du B, Deng C, Liu M, Lang B (2015) Structure sensitive hashing with adaptive product quantization. IEEE Trans Cybern 46(10):2252–2264
Liu X, Fu Q, Wang D, Bai X, Wu X, Tao D (2020) Distributed complementary binary quantization for joint hash table learning. In: IEEE transactions on neural networks and learning systems, pp 1–12
Malkov Y, Ponomarenko A, Logvinov A, Krylov V (2012) Scalable distributed algorithm for approximate nearest neighbor search problem in high dimensional general metric spaces. In: Proceedings of the 5th international conference on similarity search and applications (SISAP), pp 132–147
Malkov Y, Ponomarenko A, Logvinov A, Krylov V (2014) Approximate nearest neighbor algorithm based on navigable small world graphs. Inf Syst 45:61–68
Malkov YA, Yashunin DA (2018) Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. In: IEEE transactions on pattern analysis and machine intelligence
Mohamed H, Marchand-Maillet S (2015) Quantized ranking for permutation-based indexing. Inf Syst 52:163–175 Special Issue on Selected Papers from SISAP 2013
Muja M, Lowe D (2014) Scalable nearest neighbor algorithms for high dimensional data. Pattern Anal Mach Intell IEEE Trans 36(11):2227–2240
Naidan B, Hetland ML (2014) Static-to-dynamic transformation for metric indexing structures (extended version). Inf Syst 45:48–60
Navarro G (2002) Searching in metric spaces by spatial approximation. Very Large Databases J VLDBJ 11(1):28–46
Navarro G, Reyes N (2008) Dynamic spatial approximation trees. ACM J Exp Algorithmics 12:1.5:1–1.5:68
Navarro G, Reyes N (2009) Dynamic spatial approximation trees for massive data. In: Second international workshop on similarity search and applications, SISAP 2009, 29-30 August 2009, Prague, Czech Republic, pp 81–88
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Ruiz G, Chávez E, Graff M, Téllez ES (2015) Finding near neighbors through local search. In: Amato G, Connor R, Falchi F, Gennaro C (eds) Similarity search and applications. Springer International Publishing, Cham, pp 103–109
Skopal T (2010) Where are you heading, metric access methods?: a provocative survey. In: Proceedings of the third international conference on similarity search and applications, SISAP ’10, pp. 13–21. ACM, New York, NY, USA
Tellez ES, Chavez E, Navarro G (2013) Succinct nearest neighbor search. Inf Syst 38(7):1019–1030
Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
Yuan J, Liu X (2016) Fast nearest neighbor search with transformed residual quantization. In: 2016 15th IEEE international conference on machine learning and applications (ICMLA), pp 971–976
Acknowledgements
The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tellez, E.S., Ruiz, G., Chavez, E. et al. A scalable solution to the nearest neighbor search problem through local-search methods on neighbor graphs. Pattern Anal Applic 24, 763–777 (2021). https://doi.org/10.1007/s10044-020-00946-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-020-00946-w