Concatenation hashing: A relative position preserving method for learning binary codes
Introduction
Nearest neighbor search is often used in machine learning and computer vision applications. However, with the development of the feature representations, images and videos are represented by the high-dimensional feature vectors, and conventional nearest neighbor search methods [1] cannot deal with the high-dimensional data. To solve this problem, recently, hashing methods are used to perform the approximate nearest neighbor (ANN) search efficiently [2], [3], [4], [5]. By mapping the high-dimensional data to the binary codes and using the Hamming distance to represent the data similarity, hashing methods can perform the ANN search on the large-scale dataset with low storage and efficient computation.
The representatives of the hashing methods are locality-sensitive hashing (LSH) [6] and its variants [7], which are data-independent hashing methods. In the LSH method, the hyperplanes are randomly generated to map the similar data to the similar binary codes in a probability. Since the data-dependent hashing methods usually have a better search accuracy than the data-independent hashing methods for ANN search, data-dependent hashing methods [8], [9], [10], [11] have become increasingly popular. Data-dependent hashing methods can be categorized as unsupervised hashing methods [12], [13] and supervised hashing methods [14], [15], [16].
By combing with supervised information, supervised hashing methods learn the hash functions to preserve the sematic similarity of the data [17], [18], [19], [20]. In contrast, unsupervised hashing methods learn the hash functions without the supervised information. They generate the binary codes to preserve the distribution of the data in the Euclidean space [21], [22], [23]. Since it usually needs a lot of manual labor to obtain the supervised information, we focus on the unsupervised hashing methods in this paper.
It has been proved that directly learning the best binary codes from the data is an NP-hard problem [24]. To avoid this problem, most of hashing methods adopt a two-stage strategy, projection stage and quantization stage [8], [24]. In the projection stage, the data are projected into a low-dimensional space by the projection functions. And in the quantization stage, the projected data are quantized to the binary codes by the sign function or other quantization functions. Spectral Hashing (SH) [24] constructs a graph to describe the relationship between the data and learns the projection hyperplanes from the graph to project the data into a low-dimensional space. However, this method cannot scale to the large-scale data. To address the scalability issue, other graph-based hashing methods [13], [25] approximate a graph by using a subset of the data, but they still confront with the out-of-example problem. With the development of the neural networks, some hashing methods [26], [27] adopt the neural networks to project the data into a low-dimensional space and learn the binary codes by quantizing the projected values. However, they are usually time-consuming and cannot be applied on the mobile devices.
Principle component analysis (PCA) [8] can learn the limited projection hyperplanes from the data to maximally preserve the data information and has a generalization ability to the unseen data. Hence, some hashing methods [8], [28] generate the projection hyperplanes by PCA in the projection stage, and rotate the hyperplanes to minimize the quantization error between the PCA-projected data and the corresponding binary codes in the quantization stage. Although these PCA-based methods can preserve the global structure of the data, they ignore the local neighborhood structure of the data. In Fig. 1(a), data points a, b, c and d are on a line through the origin. No matter how to rotate the hyperplanes around the origin, a and b cannot be separated. Neither can c and d.
Clustering-based hashing methods learn the hash functions by employing the clustering technique to model the complex relationship among the data. In spherical hashing (SPH) [29], each bit of the binary code is generated by using a hypersphere-based hash function to group the spatially coherent data points. For K-means hashing (KMH) [12], it discovers the clusters of the data and learns the binary codes for the indices of the clusters. Since the data points in the same cluster are given the same binary codes, two similar data points may be assigned to two different clusters and then encoded into two different binary codes. As shown in Fig. 1(b), although data points b and c are closer than c and d, c and d are in the same cluster while b and c are from different clusters. The boundary of the cluster affects the performance of KMH.
Intuitively, if two data points are close to each other, their relative positions to each cluster center are close. Following this intuition, we propose a new hashing method, concatenation hashing (CH), to learn the binary code for the data point by concatenating the substrings learnt based on its relative positions to the cluster centers. The proposed method simultaneously performs the clustering technique and learns the hyperplane-based hash functions to preserve the relative position information of the data in each cluster. Hence, if two data points are close to each other, their substrings in each cluster should be similar. As shown in Fig. 1(c), b and c are closer than any other two data points and they are in the same side of each hyperplane which is generated based on the corresponding cluster center as the origin. The contributions in this paper are listed as follows:
- •
By employing the clustering technique and concatenating the substrings learnt by the hash functions in each cluster, the proposed method can model the complex relationship among the data and alleviate the effect brought from the boundary of the cluster.
- •
An alternating optimization is developed to simultaneously discover the cluster structures of the data and learn the hash functions to preserve the relative positions of the data to each cluster center.
- •
The experiments show that the proposed method is competitive to or better than other unsupervised hashing methods. Especially when learning the long codes in order to achieve the high search precision, the proposed method is obviously superior to the other methods.
Section snippets
Objective function
Assume there is a set of N data points {}, xi ∈ forming the columns of the data matrix X ∈ . The goal of the hashing method is to learn the corresponding binary codes {}, yi ∈ {0, 1}K, forming the columns of the binary code matrix Y ∈ where K denotes the length of the binary code. As mentioned in Kong et al. [30], since directly learning the binary codes from the data is an NP-hard problem, most of the hashing methods [8], [13], [29] adopt a two-stage
Datasets and evaluation protocols
The experiments are performed on the following three datasets.
- (a)
CIFAR-10 dataset [36] CIFAR-10 is a set of 60,000 32 × 32 images, each of which is represented by a 512-dimensional GIST feature [37]. 10,000 images are randomly selected as the queries, and the rest are used for training and searching.
- (b)
MNIST dataset [38] MNIST is a set of handwritten digits, which has a training set of 60,000 images, and a test set of 10,000 images. Each image is represented by a 800-dimensional feature vector
Conclusion and future work
In this paper, we propose a new hashing method to simultaneously cluster the training data and learn the hash functions in each cluster. The corresponding binary code of the data point is obtained by concatenating the substrings from each cluster. By clustering the data and integrating the information from each cluster, our method can handle the data with complex distribution and alleviate the effect brought from the boundary of the cluster. Further, to minimize the quantization error between
Declaration of competing interest
We declare that we have no conflict of interest.
Acknowledgement
This work was supported in part by the Shenzhen Municipal Development and Reform Commission (Disciplinary Development Program for Data Science and Intelligent Computing), in part by Shenzhen International cooperative research projects GJHZ20170313150021171, and in part by NSFC-Shenzhen Robot Jointed Founding (U1613215).
Zhenyu Weng is a Ph.D. student in School of Electronics Engineering and Computer Science, Peking University. He received his B.S. degree in Computer Science from Sun Yat-sen University in 2013. His research interests include computer vision, machine learning and multimedia information retrieval.
References (48)
- et al.
SCRATCH: A scalable discrete matrix factorization hashing for cross-modal retrieval
ACM on Multimedia Conference
(2018) - et al.
Quantization-based hashing: a general framework for scalable image and video retrieval
Pattern Recognition
(2018) - et al.
Adaptive hash retrieval with kernel based similarity
Pattern Recognit.
(2018) - et al.
Supervised discrete discriminant hashing for image retrieval
Pattern Recognit.
(2018) - et al.
Supervised learning based discrete hashing for image retrieval
Pattern Recognit.
(2019) - et al.
An improved density peaks clustering algorithm with fast finding cluster centers
Knowl.-Based Syst.
(2018) - et al.
A multiway p-spectral clustering algorithm
Knowl.-Based Syst.
(2019) - et al.
Optimised kd-trees for fast image descriptor matching
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2008) - et al.
A general two-step approach to learning-based hashing
Proceedings of the IEEE International Conference on Computer Vision
(2013) - et al.
Asymmetric distances for binary embeddings
IEEE Trans. Pattern Anal. Mach. Intell.
(2014)
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
47th Annual IEEE Symposium on Foundations of Computer Science
Kernelized locality-sensitive hashing
IEEE Trans. Pattern Anal. Mach. Intell.
Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval
IEEE Trans. Pattern Anal. Mach. Intell.
Deep hashing for compact binary codes learning
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
A survey on learning to hash
IEEE Trans. Pattern Anal. Mach. Intell.
Distributed adaptive binary quantization for fast nearest neighbor search
IEEE Trans. Image Process.
K-means hashing: an affinity-preserving quantization method for learning binary compact codes
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Large graph hashing with spectral rotation
Association for the Advancement of Artificial Intelligence
Column sampling based discrete supervised hashing
Association for the Advancement of Artificial Intelligence
Deep supervised discrete hashing
Advances in Neural Information Processing Systems
Deep priority hashing
ACM on Multimedia Conference
Supervised hashing with kernels
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Discrete graph hashing
Advances in Neural Information Processing Systems
Toward optimal manifold hashing via discrete locally linear embedding
IEEE Trans. Image Process.
Cited by (6)
An optimized deep supervised hashing model for fast image retrieval
2023, Image and Vision ComputingBinary Representation via Jointly Personalized Sparse Hashing
2022, ACM Transactions on Multimedia Computing, Communications and ApplicationsOnline hashing with similarity learning
2021, arXiv
Zhenyu Weng is a Ph.D. student in School of Electronics Engineering and Computer Science, Peking University. He received his B.S. degree in Computer Science from Sun Yat-sen University in 2013. His research interests include computer vision, machine learning and multimedia information retrieval.
Yuesheng Zhu received his B.Eng. degree in radio engineering, M. Eng. degree in circuits and systems and Ph.D. degree in electronics engineering in 1982, 1989 and 1996, respectively. He is currently working as a professor at the Lab of Communication and Information Security, Shenzhen Graduate School, Peking University. He is a senior member of IEEE, fellow of China Institute of Electronics, and senior member of China Institute of Communications. His interests include digital signal processing, multimedia technology, communication and information security.