LSH kNN graph for diffusion on image retrieval

Magliani, Federico; Prati, Andrea

doi:10.1007/s10791-020-09388-8

LSH kNN graph for diffusion on image retrieval

Published: 07 January 2021

Volume 24, pages 114–136, (2021)
Cite this article

Download PDF

Information Retrieval Journal Aims and scope Submit manuscript

LSH kNN graph for diffusion on image retrieval

Download PDF

545 Accesses
1 Citation
3 Altmetric
Explore all metrics

Abstract

Experimental results demonstrated the goodness of the diffusion mechanism for several computer vision tasks: image retrieval, semi-supervised and supervised learning, image classification. Diffusion requires the construction of a kNN graph in order to work. As predictable, the quality of the created graph influences the final results. Unfortunately, the larger the used dataset is, the more time the construction of the kNN graph takes, since the number of edges between nodes grows exponentially. A common and effective solution to deal with this problem is the brute-force method, but it requires a very long computation on large datasets. This paper proposes improvements on LSH kNN graph method that efficiently create an approximate kNN graph which is demonstrated to be faster than other state-of-the-art methods (18x faster than brute force on a dataset of more than 100k images) for content-based image retrieval, while obtaining also comparable performance in terms of accuracy. LSH kNN graph has been tested and compared with the state-of-the-art approaches for image retrieval on several public datasets, such as Oxford5k, ${\mathcal {R}}$Oxford5k, Paris6k, ${\mathcal {R}}$Paris6k and Oxford105k.

End-to-End Object Detection with Transformers

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

1 Introduction

Content-Based Image Retrieval (CBIR) is a research topic related to computer vision area. The problem focuses on the search for a query image in a dataset and rank the results based on the similarity to it. This image can be chosen or photographed by a mobile device. This problem seems simple to solve, but there are several challenges to be faced. The most significant ones are the robustness to orientation, scale and occlusion. With the recent advent of features extracted by means of Convolutional Neural Networks (CNN), it has been possible to obtain remarkable results, mitigating the effect of these problems. In parallel, several new embedding strategy were proposed (Gordo et al. 2017; Magliani and Prati 2018; Tolias et al. 2016). The combination of the new architecture for the feature extraction phase and the new method for the creation of global descriptors allowed to realise effective and efficient pipelines for CBIR problem, making feasible CBIR solutions also in case of large-scale datasets with reasonable retrieval time (Magliani et al. 2019).

A recent breakthrough on this topic was made possible thanks to an application of graph theory, such as diffusion process, allowing to outperform the previous state of the art. In particular, the diffusion mechanism can be applied for retrieval task with outstanding results (Iscen et al. 2017), because instead of using distances in Euclidean space like in case of brute-force, you can find actual query neighbours on the Reimannian manifold create by diffusion. This process exploits the distribution of the data over the manifold through the creation of a graph, that represents the connection between dataset elements. The graph is mathematically represented by a pairwise affinity matrix (Zhou et al. 2004). Therefore, as also previously stated, the diffusion process requires the creation of a kNN graph of the embeddings used to represent the dataset images (Fig. 1). Of course, the quality of the embeddings influences the results that can be achieved applying the diffusion process (1).

Once the kNN graph is created, the diffusion process works by finding (through random walks), for each node, the best path to reach the query, exploiting the weights of the traversed edges. The weights represent the similarity between the nodes connected by the edge (the greater the weight, the more similar the two nodes are).

Unfortunately, this approach also bears with it some drawbacks: (i) the setting of diffusion parameters is hard since they are dependent of the specific data distribution; and (ii) the time necessary to create the kNN graph can be unbearable for large datasets.

In practice, in order to apply diffusion a kNN graph needs to be created and its number of edges heavily influences the retrieval result. Moreover, it is hard to predict how much connected the graph needs to be to achieve good results. Therefore, the straightforward solution is to fully connect the nodes through the so-called brute-force strategy. This strategy is very easy to implement, but it tends to be very slow in case of large datasets. Given a dataset of ${\mathcal {N}}$ images, the brute-force graph will have ${\mathcal {N}}^2$ edges, meaning that for ${\mathcal {N}} = 100k$, the number of edges of the brute-force graph will be equals to 10 billions.

As a consequence, different methods were proposed to solve this task in an approximate way, trying to create fastly a high quality approximate kNN graph (Chen et al. 2009; Dong et al. 2011; Sieranoja and Fränti 2018; Zhang et al. 2013) (see Sect. 2 for further details).

This paper is based on our previous work on LSH kNN graph method (Magliani et al. 2019). Our proposed method called LSH kNN graph follows the principle that not all the connections between nodes in the graph are necessary. It uses Locality Sensitive Hashing (LSH) projections (Indyk and Motwani 1998) to subdivide the images contained in the dataset in many subsamples that are related to the buckets created by the application of LSH algorithm. Then, for each subsample, only the pair of images with a similarity greater than a threshold will be maintained and connected in the final graph. This process is repeated for each subsample and for different hash tables. The trade-off between the quality of the graph and the creation time is an important parameter of this method. In particular, the LSH kNN graph reaches the same or better retrieval results than many state-of-the-art algorithms on several public image datasets, but in much shorter time compared with the other methods.

The main contributions of this paper are:

A complexity analysis is presented in order to support the goodness of the presented method.
Several code optimizations for the creation of the graph are showed in order to reduce the computational time and the usage of memory.
Experiments on several public image datasets and comparison with state-of-the-art methods.
Improvements on the quality of the graphs due to some refinement techniques based on the neighbour propagation technique. Our proposed technique, called sorted neighbour propagation allows to achieve better retrieval results with the diffusion application respect to many other graph refinement strategies.

This paper is organised as follows. Section 2 introduces the general techniques used in the state of the art, while Sect. 3 reports some background information about ranking with diffusion. Next, Sect. 4 describes the proposed algorithm with a complete complexity analysis (Sect. 4.4) of the proposed method. Moreover, some refinement techniques (Sect. 4.5) are described and tested in order to demonstrate how it is possible to improve the quality of the graph with no extra effort. In Sect. 4.6 are reported the implementation details of the presented method, instead Sect. 4.7 refers to the parameter tuning in the LSH kNN graph approach. Then, Sect. 5 reports the experimental results on five public datasets: Oxford5k, ${\mathcal {R}}$Oxford5k, Paris6k, ${\mathcal {R}}$Paris6k and Oxford105k. Finally, concluding remarks are reported.

2 Related work

Recently, several graph applications in computer vision tasks have been proposed in the literature: diffusion for retrieval (Iscen et al. 2017), unsupervised or semi-supervised training (Douze et al. 2018; Iscen et al. 2018), image classification (Li et al. 2016) and manifold embedding (Xu et al. 2018).

Similarly to the work presented in this paper, k-Nearest Neighbour (kNN) techniques are used to create the similarity graph used in retrieval. More formally, you can describe the undirected graph G with G(V, E), where V represents the set of nodes $V=\left\{ v_1, v_2, \dots , v_n\right\}$ and E represents the set of edges $E=\left\{ e_1, e_2, \dots , e_n\right\}$. The nodes represent all the images in the dataset and the edges represent the connections between nodes. The weight of each edge determines how much the two images are similar: the higher the weight, the more similar the two images are. The weights of the edges are set with the cosine similarity calculated between the embeddings.

The problem of creating the kNN graph differs from the nearest neighbours search task since it does not need to index all the dataset image in order to fastly retrieve the elements similar to the query image, but it needs to create the relations between all the similar images in the dataset (Chen et al. 2009). For example, PQ (Jegou et al. 2011) and BoI (Magliani et al. 2018) are nearest neighbours methods, but they are not suitable for this task due to data structure adopted, not graph based. After the creation of the graph, the application of some heuristic allows to extrapolate useful information through the graph for improving the performance of the retrieval system or the image classifier.

Different solutions are available in the literature to efficiently create the kNN graph. The most simple is the exact or brute-force method. The advantages of this methods are that is simple to implement and that obtains usually the best results. Unfortunately, it requires very long time to compute.

Alternatively, approximate kNN graph algorithms want to speed up the process, but maintaining good performance after diffusion application. They can be subdivided in two families of strategies: algorithms based on divide and conquer strategy and techniques based on local search optimizations (e.g., NN-descent Dong et al. 2011). As the name says, divide and conquer is composed by two steps: firstly, based on a certain heuristic, the images in the dataset are divided in subsamples and then for each subsample a kNN graph is created. In the end, all the created subgraphs are merged, obtaining the final kNN graph. Naturally, the number of subdivisions influences the final performance and the computational time of the approximate kNN graph algorithm. Moreover, the heuristic used for the subdivision task is crucial for the method and needs to be very effective and efficient. For instance, the well-known K-means algorithm (Arthur and Vassilvitskii 2007), while being widely and successfully used for clustering, is too slow for this task. To solve this problem, the method proposed in this paper is faster than K-means.

An interesting work (Zhang et al. 2013) following the divide and conquer strategy exploits LSH (Locality-sensitive hashing) to create the approximate kNN graph by using spectral decomposition of a low-rank graph matrix. Instead, Chen et al. (2009) follow the same strategy, but applying recursive Lanczos bisection. In this case, two divide steps are proposed: the overlap and the glue method. The difference between the two proposed techniques is on the subsets, overlapped for the former and disjointed for the latter. Another interesting paper from Wang et al. (2012) proposes an algorithm for the creation of an approximate kNN graph based on random collections of dataset elements. Repeating many times this process allows to theoretically cover the entire dataset.

On the other hand, the methods based on local optimizations are based on the principle that “a neighbour of my neighbour is my neighbour”, introduced by Dong et al.with NN-descent (Dong et al. 2011). Starting from a random Nearest Neighbour (NN) list for each node, the method iteratively tries to update these lists. The update process is very simple: for a node a, the algorithm finds two neighbours (b and c) and then tries to update the NN list of b with the distance d(a, b) and the NN list of c with the distance d(a, c). The process is repeated until the number of updates executed on the NN lists is less than a threshold, selected as parameter of the algorithm. A weakness of this method is the correct setting of the initial dimension of the NN lists and the number of updates to execute on them. In fact, if the dimension of the lists is large or the number of updates is very high, the method will require very long time to compute the kNN graph. Different works tried to adapt the NN-descent to their specific application domains (Debatty et al. 2014; Houle et al. 2014; Park et al. 2013).

Finally, a mixed solution, called Random Pair Division, based on both divide-and-conquer strategy and NN-descent was proposed by Sieranoja et al.Sieranoja and Fränti (2018). The first step is the subdivision of the dataset elements in order to speedup the subgraphs creation. The heuristic adopted is very simple: starting from two random dataset elements, all the elements will be assigned to one of the two sets based on the distance to the initial random selected element. The process is repeated if the size of one set is greater than a threshold. After that, the subgraphs are created on the elements contained in the subsamples using the brute-force approach. In addition, the NN-descent is applied to improve the quality of the graph and to connect also elements of different subgraphs.

3 Ranking with diffusion

Diffusion is a mechanism that exploits the graph structure of the collection to find similar images to the one submitted as the query (Donoser and Bischof 2013; Zhou et al. 2004). To apply diffusion we need an affinity matrix that is defined as follows.

The affinity matrix A is the adjacency matrix of a weighted undirected graph G. It is symmetric ($A = A^T$), positive ($A > 0$) and with zero self-similarities ($diag(A) = 0$). In order to apply diffusion, it is worth to calculate the Laplacian of the graph ${\mathcal {L}} = D - A$, where $D = diag(A1_n)$ is the degree of the graph and $A1_n$ is the diagonal matrix with the row-wise sum of A. Further typical step requires to normalize the affinity matrix to obtain the transition matrix $S = D^{-1/2}AD^{-1/2}$ and the Laplacian ${\mathcal {L}} = I_n - S$ where $I_n$ indicates the identity matrix, that has size equals to n.

After the creation of the Laplacian and the relative normalization, Zhou et al. (2004) proposed to apply diffusion for retrieval purposes starting from the query points. They created a vector $y = (y_i) \in {\mathbb {R}}^n$ in this way:

$$\begin{aligned} y_{i} = \bigg \{ \begin{array}{ll} 1 &{} \quad if \quad x_i \quad is \quad a \quad query \\ 0 &{} ~~~~otherwise \\ \end{array} \end{aligned}$$

The objective of ranking with diffusion is to find the neighbours of a query, therefore a ranking function $f = (f_i) \in {\mathbb {R}}^n$, that allows to generate a vector with the similarity score of each image $x_i$ to the query, is created. It is worth to note that this process needs to be repeated for each query. The diffusion mechanism can be represented in the following way by the ranking function:

$$\begin{aligned} f^t = \alpha S f^{t-1} + (1-\alpha )y \end{aligned}$$

The ranking function defines the random walk process on the graph, while $\alpha$ indicates the probability to jump on an adjacent vertex according to the distribution S and ($1 - \alpha$) indicates the probability to jump to a query point. At the beginning of this process the ranking function is initialised with the value obtained from the application of the Euclidean distance. Repeating many times this process allows for each point to spread their ranking score to their neighbours in the graph. Exploiting this principle it is possible to better capture the manifold structure of the dataset than applying the Euclidean distance.

4 Proposed approach

LSH kNN graph adopts LSH to subdivide in subsets the global descriptors representing the images of the dataset. The number of the subsets depends to the hash dimension used for the projection phase and the size of each set usually depends to the dataset size because the subdivision is pretty much similar in each bucket. In the following, first the hashing technique is introduced and then the entire algorithm is described.

4.1 Notations and background of LSH

Locality-Sensitive Hashing (LSH) (Indyk and Motwani 1998) is a hashing technique based on the principle that similar points will be close also in the projected space with high probability.

The LSH function for Hamming space is a scalar projection:

$$\begin{aligned} h({\mathbf {x_f}})=sign({\mathbf {x_f}}\cdot {\mathbf {p}}) \end{aligned}$$

(1)

where ${\mathbf {x_f}}$ is the feature vector and ${\mathbf {p}}$ is a vector with the components randomly selected from a Gaussian distribution ${\mathcal {N}}(0,1)$, called projection function.

This process can be repeated many times (L represents the number of hash tables used in the LSH process) in order to improve the quality of the projections, using different Gaussian distribution.

A common LSH application for retrieval purposes can be summarised with these three steps:

1.
project all the database descriptors using different Gaussian distributions;
2.
for each query, project the image descriptor using the same Gaussian distributions adopted for the database elements;
3.
search and rank in the hash table buckets the database images.

Many other hashing techniques have been proposed and implemented. For example, the multi-probe LSH (Lv et al. 2007) tries to reduce the number of hash tables used for the projections, exploiting the fundamental principle of LSH that similar items will be projected in the same buckets or in near buckets with high probability. This idea is implemented checking, during the search phase, also the buckets near the query bucket. Sadly, the performance improvement determines, as a consequence, an increase of the computational time.

4.2 LSH kNN graph

LSH kNN graph creates an undirected kNN graph G from a dataset ${\mathcal {S}} = \{s_1, \dots , s_{\mathcal {N}}\}$ of ${\mathcal {N}}$ images. To create the graph and connect the nodes through edges, a similarity measure $\theta : {\mathcal {S}} \times {\mathcal {S}} \rightarrow {\mathbb {R}}$ is adopted. The connection between the nodes i and j in the graph is calculated with the similarity measure $\theta (s_i, s_j) = \theta (s_j, s_i)$. There are different techniques to calculate the similarity measure. For our purpose we adopted the cosine similarity, that can be calculated with the dot product (scaled by magnitude) between the global image descriptors of the dataset images. The proposed approach follows the divide-and-conquer strategy since the first step is the split of the dataset elements in many subsets based on LSH projections, as showed in Fig. 2. As previously reported, LSH allows to project similar elements in the same bucket in a projected space. Exploiting this principle it is possible to create a set of buckets $B = \{B_1, \dots , B_m\}$ from several hash tables. In addition, the use of more or less bits ($\delta$) for the projection step influences the quality of the results and the final number of the buckets. Considering also the number of the hash tables (L) adopted for the projection, the total number of buckets will be $N = 2^\delta \cdot L = |B| \cdot L$. We will indicate the n elements of the i-th bucket $B_i$ as follows: $B_i = \{b_{i1}, \dots , b_{in}\}$. There are no guarantees that all the similar elements will be in the same bucket because this approach represents an approximate solution. As a consequence, a good idea is to try to find a trade-off between the number of the buckets for each hash table ($2^\delta$), by tuning the bits used ($\delta$) for the projection step and the number of hash tables (L). More experiments on the values of these two parameters are reported in the Sect. 4.7. Usually, if the objective is to project more elements in the same bucket, a good solution is to use a small number of buckets. It allows to reduce the time spent in the divide phase, but, on the other hand, the conquer phase will require more time to be executed. On the other hand, with more bits adopted for the projections, and thus more buckets for hash tables, the divide step will be lightly slower, but the conquer one will be faster.

The conquer step provides the connection among the elements in each bucket. During this phase, the pipeline connects the dataset elements and stores in memory the final graph. In this case, the method adopted to solve this subtask is the brute-force approach, so all the elements in the bucket are connected, creating a kNN graph $G = \left( V,E\right)$, where $V = \left( b_{x1},\dots ,b_{xn}\right)$ where $x = 1,\dots ,m$ and E is the set of edges with weights computed with the similarity $\theta$: $E=\left\{ \forall \left( b_{xi}, b_{xj}\right) \in B_i : \theta \left( b_{xi}, b_{xj}\right) \right\}$ where $x = 1,\dots ,m$. The key point here is that applying brute-force several times but on smaller sets results at the end to be faster than applying it once but on the entire, larger set of data. Morever, differently from other methods based on the divide-and-conquer strategy, no final merge between all the subgraphs is required, since in our case a single graph is created and updated with new connections. For more details on the implementation, please check Sect. 4.6.

4.3 Multi-probe LSH kNN graph

In addition to the basic LSH kNN graph described in the previous section, a multi-probe version of it is also here proposed. This method exploits the principle of multi-probe LSH with the objective of reducing the number of hash tables used.

Multi-probe LSH (Lv et al. 2007), during the query phase, checks also buckets near the query bucket $b_{query}$ because they probably contain similar elements to the ones contained in it. For our purpose, this idea can be exploited during the projection step. It means that each dataset elements, after the hashing phase, it will be projected also in the neighbours buckets, as showed in Fig. 3. The process will be lightly slower due to the greater number of projections to be performed. In order to maintain a good trade-off between quality of the graph and computational time, the elements will be projected only in the 1-neighbourhood. It is worth to note that the buckets are constructed using binary numbers, so the Hamming distance can be exploited. As a consequence, 1-neighbourhood represents the set of buckets with Hamming distance less or equal to 1 ($H_d(b_{xi},b_{xj}) \le 1$).

More formally, the elements obtained with the application of the multi-probe LSH are the followings:

$$\begin{aligned} B_{multi-probe}= & {} \{b_{x1}, \dots , b_{xn} \} : \\&H_d(b_{query},b_{xj}) \le 1 \wedge b_{xj} \in B; x=1,\dots ,m, j=1,\dots ,n \end{aligned}$$

Similarly to the basic LSH kNN graph, the growth of the bits used for the hashing task directly influences the number of neighbours available in each bucket in this way: $\sum _{i=0}^{n}{{\log _{2}\delta } \atopwithdelims (){i}}$.

Although usually the final results are better than the ones obtained by the previous method, the total computational time needed by this approach is greater as well. A possible solution for this problem is represented by using a percentage $\gamma$ that allows to unsupervisely decide to project or not the elements also in the 1-neighbourhood. For example, by setting $\gamma = 50\%$, only half of the elements will be projected also in the 1-neighbourhood buckets. Empirically, it has been found that the best trade-off is reached using $\gamma = 50\%$, which will be the value used in in all our experiments.

4.4 Complexity analysis

In this section, we will briefly analyse the complexity of the proposed methods.

For the projections phase when LSH is applied, the complexity will be $O(\delta \cdot \varDelta \cdot L \cdot {\mathcal {N}})$, where ${\mathcal {N}}$ is the number of images in the dataset, $\delta$ is the number of bits used in each projection, L is the number of hash tables and $\varDelta$ represents the dimension of the embedding used for the representation of the input image. In the case of multi-probe LSH, the complexity will be greater because each image is projected in more buckets: $O(\delta \cdot \varDelta \cdot L \cdot ({\mathcal {N}} \cdot \gamma \cdot L) \cdot {\mathcal {N}}).$

Then, the calculation of the similarity measure of all the possible pairs of elements contained in a bucket has a complexity of $O(n^2 \cdot 2^\delta \cdot L)$, where n represents the number of elements found in the bucket. By hypothizing a uniform distribution of buckets, the value of n can be approximated as: $n \sim \frac{{\mathcal {N}}}{2^\delta }$.

For supporting this hypothesis, Figs. 4 and 5 show the LSH distributions (for different values of $\delta$) on Oxford5k dataset (see Sect. 5). The values reported in each graph represent the distribution of the database elements in the buckets.

Following (Pearson et al. 1977), we executed the Pearson test , that evaluates the null hypothesis that a sample comes from a normal distribution. For the first distribution the null hypothesis can be rejected due to the chi squared probability for the hypothesis test that is lower than a threshold ($th = 0.001$). The result of the test does not mean that it is not a Gaussian function, but it is not possible to be sure that it is. Instead, for the second distribution the null hypothesis cannot be rejected because the result obtained by the test is greater than the threshold.

The complexity needed for the combination of the subgraphs (conquer phase) is negligible because all the subgrahps are directly appended on the final graph.

To conclude, the final complexity of the proposed approaches can be obtained by summing the single components:

for basic LSH kNN graph approach: $O(\delta \cdot \varDelta \cdot L \cdot {\mathcal {N}}) + O(n^2 \cdot 2^\delta \cdot L)$ which can be further simplified (exploiting the approximation of n mentioned before) in $O(\frac{L \cdot {\mathcal {N}}^2}{4} + L \cdot {\mathcal {N}} \cdot \delta \cdot \varDelta )$;
for multi-probe LSH kNN graph approach: $O\left( \delta \cdot \varDelta \cdot L \cdot \left( {\mathcal {N}} \cdot \gamma \cdot L\right) \cdot {\mathcal {N}})+ O(n^2 \cdot 2^\delta \cdot L\right)$ which can be further simplified as before and also removing lower order terms in $O(L^2 \cdot {\mathcal {N}}^2 \cdot \delta \cdot \varDelta \cdot \gamma )$.

Therefore, it is evident that while basic LSH kNN approach is bounded $O\left( L \cdot {\mathcal {N}}^2\right)$, multi-probe version is, as expected, more computationally complex and bounded $O\left( L^2 \cdot {\mathcal {N}}^2\right)$.

4.5 Graph refinement

Graph refinement or neighbour propagation is an important step during the kNN graph creation task. It allows to refine the quality of the graph in order to improve the final results. In general, the algorithm aims at adding more edges between nodes in the graph (as shown in the Fig. 6) since, hopefully, these edges will improve the diffusion result. Unfortunately, these improvements require an extra effort and the final computation time will be greater.

The most diffused graph refinement method is one-step neighbour propagation (Dong et al. 2011). It is an iterative process, in which the neighbours of neighbours are checked. In other words, if a is a neighbour of b and b is a neighbours of c, then is likely that a is a neighbour of c. This approach requires the maintenance of a kNN list of each node. For each node, two neighbours are randomly picked and then connected if their similarity is greater than the worst in the list, by also updating the other kNN lists accordingly. This process continue until the number of updates on the kNN lists surpasses a threshold value.

In this paper we propose a novel method called sorted neighbour propagation, that represents an improvement to the previously presented technique. The kNN lists are sorted based on the similarity obtained during the creation of the kNN graph and then only the topN elements are evaluated. All the possible pairs of neighbours found in these topN elements with a similarity value greater than the threshold are added to the graph. Increasing this value allows to improve the quality of the final graph, but the time needed for the creation of the graph growths in a non linear way. Experiments in the next section will show the performance of the proposed method on different public image datasets compared to other state-of-the-art techniques, such as: kNN graph without graph refinement (as a baseline), random propagation and one-step neighbour propagation. The baseline is an approximate kNN graph constructed using the LSH kNN graph method previously explained, but with different parameters: $\delta = 6$ and $L = 2$ instead of $\delta = 6$ and $L = 20$. This parameter choice allows to easily highlight the improvements on the graph refinement techniques on small approximate graphs.

4.6 Implementation details of LSH kNN graph approach

The projection algorithm works as follows. For each bit we executed the dot product between the image descriptor and the corresponding projection vector. If the results is positive, the value of the projected bucket is increased by a power of two. For example, considering a hash table composed by 8 buckets ($\delta = 3$) and the first dot product negative, the second and the third positive, the element will be projected in the sixth bucket, because $6 = (2^0) \cdot {\mathbf {0}} + (2^1) \cdot {\mathbf {1}} + (2^2) \cdot {\mathbf {1}}$. This process will be executed for each hash table and for all the image descriptors.

Two implementation variants of our LSH kNN graph are proposed. From now, the kNN graph will be represented by the affinity matrix A, that represents the weight edges between all the nodes. This abstraction can help for the implementation of the algorithms.

$$\begin{aligned} A = \begin{bmatrix} a_{11} &{} a_{12} &{} a_{13} &{} \dots &{} a_{1N} \\ a_{21} &{} a_{22} &{} a_{23} &{} \dots &{} a_{2N} \\ \vdots &{} \vdots &{} \vdots &{} \ddots &{} \vdots \\ a_{N1} &{} a_{N2} &{} a_{N3} &{} \dots &{} a_{NN} \end{bmatrix} \end{aligned}$$

Furthermore, not all the similarities are useful for the diffusion process, suggesting to remove or avoid to insert edges with weight less than a threshold (th), without jeopardising the final retrieval performance. From our experiments, this threshold can be set to 0.3.

The above algorithm summarizes the procedure for filling the A affinity matrix. At the beginning each element of the matrix is set to 0.0 and then if the similarity measure between the nodes is greater than a threshold, this measure becomes: $a_{ij} = \theta (d_j, d_i)$, where $d_i$ and $d_j$ represent two images of the dataset, that are projected in the same bucket for LSH kNN graph or in the same or 1-neighbour bucket for multi-probe LSH kNN graph approach.

Unfortunately, it is impossible to apply this approach on large datasets, because pre-allocating the entire dense matrix depends to the available RAM memory and it will hardly possible to execute on datasets of size greater of 100k images. Therefore, for this case, instead of working on a dense matrix, a sparse matrix is used.

Sparse matrices can be used to reduce the computational time and still obtain good results also on large datasets, because the affinity matrices typically contain a lot of zeros. For instance, on Oxford5k dataset the approximate kNN graph has only the $0.7\%$ of the edges of the brute-force kNN graph.

Moreover, considering that the matrix is symmetric, only the upper or lower values of the matrix are needed. Therefore, the previous condition adopted in the procedure of LSH kNN graph can be changed in this way:

$$\begin{aligned} a_{ij} = \bigg \{ \begin{array}{ll} \theta (d_i, d_j) &{} {\text { if}}\, j \ge i \wedge \theta (d_i, d_j) \ge th \\ 0 &{} {\text { otherwise} } \\ \end{array} \end{aligned}$$

If the column index is not greater than row index, the rows and the columns are swapped due to the symmetric properties of the affinity matrix.

Two different types of sparse matrix has been tested: Compressed Row Storage (CRS) format and Coordinate (COO) format (Golub and Van Loan 2012). The CRS sparse matrix is composed by three vectors: values (containing the values of the dense matrix different from zero); column indexes (containing the column indexes of the elements contained in the values vector); and row pointers (containing the locations of the values vector that indicate the beginning of a new row). Instead, the COO sparse matrix is composed by three vectors: a vector representing the non-zero elements (the values), the row and the column coordinate of each value contained in the values vector. The second solution is simpler than the first to implement, but it requires more space on disk.

However, using hash tables, it happens that the same edge weight is inserted multiple times. Therefore, every time a new value is inserted in a CRS matrix, checking whether the value is already in the matrix might be a possible solution. Unfortunately, this tends to be a time consuming process. Conversely, using a COO matrix, all the values (including repeated ones) are inserted, but a sorting is performed and duplicates are removed. Applying once the sorting and removing the duplicates is faster than performing ${\mathcal {N}} \cdot L$ times the search, given that sorting has a $O({\mathcal {N}}\log _2{\mathcal {N}})$ complexity which is lower than the $O({\mathcal {N}})$ complexity of the search.

4.7 Parameters of LSH kNN graph approach

The proposed method uses LSH projections for the creation of the approximate kNN graph. The main advantages of LSH are the simplicity to use and the speed of the method. For example, apply LSH on 100k images in C++ needs only 10 seconds. It is worth to note that the variation of the values of the LSH parameters can change considerably the final performance. For both the two parameters ($\delta$ and L), in order to find the best combination, it is suggestable to execute several experiments.

Table 1 Diffusion parameter values adopted for the experiments on Oxford5k

LSH kNN graph for diffusion on image retrieval

Abstract

Similar content being viewed by others

End-to-End Object Detection with Transformers

A survey on Image Data Augmentation for Deep Learning

ImageNet Large Scale Visual Recognition Challenge

1 Introduction

2 Related work

3 Ranking with diffusion

4 Proposed approach

4.1 Notations and background of LSH

4.2 LSH kNN graph

4.3 Multi-probe LSH kNN graph

4.4 Complexity analysis

4.5 Graph refinement

4.6 Implementation details of LSH kNN graph approach

4.7 Parameters of LSH kNN graph approach

5 Experimental results

5.1 Datasets

5.2 Evaluation metrics

5.3 The importance of diffusion for retrieval

5.4 Results on Oxford5k

5.5 Results on \({\mathcal {R}}\)Oxford5k

5.6 Results on Paris6k

5.7 Results on \({\mathcal {R}}\)Paris6k

5.8 Results on Oxford105k

6 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation