Elsevier

Pattern Recognition

Volume 100, April 2020, 107116
Pattern Recognition

Deep reinforcement hashing with redundancy elimination for effective image retrieval

https://doi.org/10.1016/j.patcog.2019.107116Get rights and content

Highlights

  • Block-wise Hash Code Inference is utilized to preserve arbitrarily large global similarity relationship.

  • Hash Code Mapping based on Multi-binary Classification is established to be trained in a point-wise style.

  • Hash Bits De-redundancy based on Deep Reinforcement Learning is created to eliminate redundant or even harmful bits from hash codes while preserving the retrieval accuracy.

Abstract

Hashing is one of the most promising techniques in approximate nearest neighbor search due to its time efficiency and low cost in memory. Recently, with the help of deep learning, deep supervised hashing can perform representation learning and compact hash code learning jointly in an end-to-end style, and obtains better retrieval accuracy compared to non-deep methods. However, most deep hashing methods are trained with a pair-wise loss or triplet loss in a mini-batch style, which makes them inefficient at data sampling and cannot preserve the global similarity information. Besides that, many existing methods generate hash codes with redundant or even harmful bits, which is a waste of space and may lower the retrieval accuracy. In this paper, we propose a novel deep reinforcement hashing model with redundancy elimination called Deep Reinforcement De-Redundancy Hashing (DRDH), which can fully exploit large-scale similarity information and eliminate redundant hash bits with deep reinforcement learning. DRDH conducts hash code inference in a block-wise style, and uses Deep Q Network (DQN) to eliminate redundant bits. Very promising results have been achieved on four public datasets, i.e., CIFAR-10, NUS-WIDE, MS-COCO, and Open-Images-V4, which demonstrate that our method can generate highly compact hash codes and yield better retrieval performance than those of state-of-the-art methods.

Introduction

With the explosive development of social media, enormous amount of data including texts, images, and videos is produced every day. To retrieve them efficiently, multiple methods have been proposed. Recently, Approximate Nearest Neighbor (ANN) search has attracted increasing attentions for its high retrieval accuracy and low computational cost. Among various ANN search methods, hashing is one of the most promising methods that generate compact hash codes for high-dimensional data points and perform retrieval in Hamming Space. The focus for this paper is on learning to hash methods [1], which are data-dependent and can utilize supervised information to generate high-quality hash codes for efficient image retrieval. Usually, data-dependent hashing methods outperform data-independent methods (e.g., Locality Sensitive Hashing (LSH) [2]) with a big margin.

For decades, many hashing methods have been proposed and studied. Recently, with the great success of deep learning, deep hashing methods are attracting increased research interests. The high fitting ability of deep neural network makes it possible to fit any non-linear hash function. Moreover, deep hashing enables an end-to-end learning style that performs feature learning and hash code learning simultaneously. On many public benchmarks, deep hashing methods yield the state-of-the-art performance with much more compact hash codes. As a notable example, robust discrete code modeling for supervised hashing [3] proposed a hashing method that leveraged discrete optimization to get rid of the quantization error and could handle both noisy hash codes and noisy semantic labels.

However, there are two major disadvantages for current deep hashing methods. First, due to the constraint of computational resources, most deep neural network methods have to be trained in a mini-batch way, which makes them very inefficient at data sampling. Suppose there are n images, for pair-wise hashing methods there are Cn2=n(n1)2 image pairs which is O(n2), and for triplet hashing methods there are total Cn3=n(n1)(n2)6 image triplets which is O(n3). It would take enormous amount of time to collect enough samples for training. Without enough samples, hashing methods can only preserve the local similarity and fail to preserve the global similarity, which may hurt the retrieval accuracy. Second, the hash codes generated by most existing methods contain some degree of redundancy. Some bits of the generated hash codes can be thrown away without harming the retrieval accuracy, and the existence of these bits may even decrease the retrieval accuracy. There are two sources for such redundant or even harmful bits. One is the noisy data in the dataset, and the other is the commonly used mini-batch based training strategy, which makes the generated hash codes only able to preserve local similarity relationships.

Based on the above observations, a novel deep reinforcement hashing model with redundancy elimination method called Deep Reinforcement De-Redundancy Hashing (DRDH) is proposed in this paper, which can fully exploit the similarity information in a global way. Our scheme adopts deep reinforcement learning to eliminate the redundancy in generated hash codes. When performing hash code inference, label information is utilized to build a similarity matrix, and a set of hash codes that can reconstruct this similarity matrix are learned. The similarity matrix is calculated in an on-demand block-wise way, so that an arbitrarily large similarity matrix can be handled. When performing hash code mapping, Deep Neural Network is exploited to map raw images to the previously inferred hash codes. This mapping can be formulated as a multi-label binary classification problem, in which the hash bit mapping can be processed in O(n) and the sampling from O(n2) pairs or O(n3) triplets will no longer be needed. After hash code mapping, Deep Reinforcement Learning is particularly exploited to eliminate redundant hash bits from the hash codes. More specifically, Deep Q Network (DQN) [4] is leveraged to learn a mask that can mask out those redundant and harmful hash bits. Extensive experiments demonstrate that DRDH can generate compact and de-redundant hash codes, and yield better retrieval performance than those of state-of-the-art methods on four public datasets of CIFAR-10, NUS-WIDE, MS-COCO, and Open-Images-V4.

Our contributions to the literature are mainly three-folds: (1) We design a new block-wise similarity calculation manner which is used to infer a set of hash codes that can preserve the global similarity relationship. Existing related methods are either trained in a mini-batch style that can only preserve local similarity relationships or conducted to achieve the hash code inference column by column, which is intractable when the dataset is really large. Different from these methods, our proposed block-wise calculation can not only fully exploit the global similarity information but also able to handle an arbitrarily large similarity matrix. (2) We leverage Deep Q Network in DRDH to eliminate redundant hash bits, which makes DRDH able to acquire compact hash codes with less redundant bits. Some existing works [5], [6], [7] adopt regularization terms in loss function to avoid redundant hash bits. Although this strategy is effective, it is at the cost of hurting the overall expressiveness of the generated hash codes, and may reduce the retrieval accuracy. Different from such a redundancy-avoidance strategy, our method aims at a redundancy-elimination mechanism which does not sacrifice the overall retrieval accuracy. (3) We perform extensive experiments on four standard benchmark image datasets to show that DRDH is better than many state-of-the-art methods in a real image retrieval environment.

The rest of the paper is organized as follows. Section 2 briefly reviews some related works. In Section 3, we describe in detail our new framework of Deep Reinforcement Hashing with Redundancy Elimination. Section 4 gives our experimental results and analyses on the algorithm evaluation, and we conclude the paper in Section 5.

Section snippets

Related works

Existing hashing methods can be roughly divided into two categories, i.e., unsupervised and supervised.

Unsupervised hashing methods such as LSH [2] and its variants aim at learning compact hash codes for unlabeled data. LSH adopts the random projection as a hash function, which usually needs long hash codes (≥128 bits) to achieve sufficient accuracy. SADH [8] established an unsupervised hashing framework, named Similarity-Adaptive Deep Hashing, which alternatingly proceeded over three training

Formulation

In a learning to hash task, a set of training images are represented as feature vectors X={xi}i=1nRn×D, where xi can be a shallow feature, deep feature or raw pixels. For supervised hashing, each image is annotated with a semantic label Y={yi}i=1n{0,1}n×m, where m is the total number of semantic categories. For yij=1, it means the ith image belongs to the jth category, and an image can belong to multiple categories. To express the similarity relationship between two images, a similarity

Experiments

We evaluate our deep reinforcement hashing with redundancy elimination approach, DRDH, against several state-of-the-art hashing methods on four standard benchmark datasets. All the related experiments are implemented with deep learning library PyTorch on a single NVIDIA GTX 1080-ti GPU.

Conclusion and future work

This paper tackles two key problems that exist in most deep learning to hash methods. The first problem is that they are usually trained in a mini-batch style, which makes them inefficient at data sampling and cannot preserve the global similarity relationship. We solve this problem by proposing a block-wise hash code inference, which can directly infer optimal hash codes from the large similarity information. The second is that most hashing methods generate hash codes with redundant bits. We

Acknowledgments

This work was supported by National Natural Science Foundation of China (No. 61976057, No. 61572140), Shanghai Municipal R&D Foundation (No. 17DZ1100504, No. 16JC1420401), Shanghai Natural Science Foundation (No. 19ZR1417200), and Humanities and Social Sciences Planning Fund of Ministry of Education of China (No. 19YJA630116). Weiguo Fan is supported by the Henry Tippie Endowed Chair Fund from the University of Iowa. Yuejie Zhang and Tao Zhang are corresponding authors.

Juexu Yang received the B.S. degree in Computer Science from Wuhan University of Technology, Wuhan, China, in 2015. He is currently a master student in School of Computer Science, Fudan University, Shanghai, China. He is a member of Institution of Media Computing in School of Computer Science. His research interests include deep hashing, discrete optimization, and computer vision.

References (54)

  • A. Gionis et al.

    Similarity search in high dimensions via hashing

  • V. Mnih et al.

    Playing atari with deep reinforcement learning

  • J.K. Song et al.

    Deep region hashing for efficient large-scale instance search from images

  • Z.F. Qiu et al.

    Deep Semantic Hashing with generative adversarial networks

  • F.M. Shen et al.

    Unsupervised deep hashing with similarity-adaptive and discrete optimization

  • W. Liu et al.

    Supervised hashing with kernels

  • P. Zhang et al.

    Supervised hashing with latent factor models

  • G. Lin et al.

    Fast supervised hashing with decision trees for high-dimensional data

  • F. Shen et al.

    Supervised discrete hashing

  • X. Zhou et al.

    Graph convolutional network hashing

    IEEE Trans. Cybern.

    (2018)
  • H. Zhu et al.

    Deep Hashing Network for Efficient Similarity Retrieval

  • E.K. Yang et al.

    Pairwise relationship guided deep hashing for cross-modal retrieval

  • Z.J. Cao et al.

    Deep learning to hash by continuation

  • H.M. Liu et al.

    Deep supervised hashing for fast image retrieval

  • F. Cakir et al.

    MIHash: online hashing with mutual information

  • F.M. Shen et al.

    Deep asymmetric pairwise hashing

  • X.F. Zhe, S.F. Chen, and H. Yan. Deep class-wise hashing: semantics-preserving hashing via class-wise loss. arXiv,...
  • Cited by (0)

    Juexu Yang received the B.S. degree in Computer Science from Wuhan University of Technology, Wuhan, China, in 2015. He is currently a master student in School of Computer Science, Fudan University, Shanghai, China. He is a member of Institution of Media Computing in School of Computer Science. His research interests include deep hashing, discrete optimization, and computer vision.

    Yuejie Zhang received the B.S. degree in Computer Software, the M.S. degree in Computer Application, and the Ph.D. degree in Computer Software and Theory from Northeastern University, Shenyang, China, in 1994, 1997 and 1999, respectively. She was a Postdoctoral Researcher at Fudan University, Shanghai, China, from 1999 to 2001. In 2001, she joined Department of Computer Science and Engineering (now School of Computer Science), Fudan University as an Assistant Professor, and then become Associate Professor and Full Professor. Her research interests include multimedia/cross-media information analysis, processing, and retrieval, and machine learning.

    Rui Feng received the B.S. degree in Industrial Automatic from Harbin Engineering University, Haerbin, China, in 1994, the M.S. degree in Industrial Automatic from Northeastern University, Shenyang, China, in 1997, and the Ph.D. degree in Control Theory and Engineering from Shanghai Jiaotong University, Shanghai, China, in 2003. In 2003, He joined Department of Computer Science and Engineering (now School of Computer Science), Fudan University as an Assistant Professor, and then become Associate Professor and Full Professor. His research interests include multimedia information analysis and processing, and machine learning.

    Tao Zhang received the B.S. and M.S. degree in Automation Control, and the Ph.D. degree in System Engineering from Northeastern University, Shenyang, China, in 1992, 1997 and 2000, respectively. He was a Postdoctoral Researcher at Fudan University, Shanghai, China, from 2001 to 2003. In 2003, he joined School of Information Management and Engineering, Shanghai University of Finance and Economics as an Associate Professor and then become Full Professor. His research interests include big data analysis and mining, system modeling and optimization.

    Weiguo Fan received the B.S. degree in information and control engineering from the Xi'an Jiaotong University, Xian, China, in 1995, the M.S. degree in computer science from the National University of Singapore in 1997, and the Ph.D. degree in AI and Information Systems from the University of Michigan, Ann Arbor, in 2002. He is currently Henry Tippie Chaired professor of business analytics at the University of Iowa. He has published more than 200 refereed articles in many premier IT/IS journals and conferences such as TKDE, PR, TOIT, WWW, SIGIR, CIKM, AAAI, and KDD. His research interests include information retrieval, data mining, text mining, Web mining, and pattern recognition.

    View full text