Skip to main content
Log in

Unified Binary Generative Adversarial Network for Image Retrieval and Compression

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Binary codes have often been deployed to facilitate large-scale retrieval tasks, but not that often for image compression. In this paper, we propose a unified framework, BGAN+, that restricts the input noise variable of generative adversarial networks to be binary and conditioned on the features of each input image, and simultaneously learns two binary representations per image: one for image retrieval and the other serving as image compression. Compared to related methods that attempt to learn a single binary code serving both purposes, we demonstrate that choosing for two codes leads to more effective representations due to less concessions needed when balancing the requirements. The added value of using a unified framework compared to two separate frameworks lies in the synergy in data representation that is beneficial for both learning processes. When devising this framework, we also address another challenge in learning binary codes, namely that of learning supervision. While the most striking successes in image retrieval using binary codes have mostly involved discriminative models requiring labels, the proposed BGAN+ framework learns the binary codes in an unsupervised fashion, yet more effectively than the state-of-the-art supervised approaches. The proposed BGAN+ framework is evaluated on three benchmark datasets for image retrieval and two datasets on image compression. The experimental results show that BGAN+ outperforms the existing retrieval methods with significant margins and achieves promising performance for image compression, especially for low bit rates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Agustsson, E., Mentzer, F., Tschannen, M., Cavigelli, L., Timofte, R., Benini, L., et al. (2017). Soft-to-hard vector quantization for end-to-end learned compression of images and neural networks. CoRR arXiv:1704.00648.

  • Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In Advances in neural information processing systems (pp. 41–48).

  • Baig, M. H., Koltun, V., & Torresani, L. (2017). Learning to inpaint for image compression. In NIPS (pp. 1246–1255).

  • Ballé, J., Laparra, V., & Simoncelli, E. P. (2016). End-to-end optimized image compression. CoRR arXiv:1611.01704.

  • Bellard, M. (2017). BPG image format. http://bellard.org/bpg/. Retrieved January 30, 2017 (1, 2).

  • Brown, M., Hua, G., & Winder, S. (2010). Discriminative learning of local image descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 43–57.

    Article  Google Scholar 

  • Cao, Y., Liu, B., Long, M., & Wang, J. (2018). Hashgan: Deep learning to hash with pair conditional wasserstein GAN. In CVPR (pp. 1287–1296).

  • Cao, Y., Long, M., Wang, J., Zhu, H., & Wen, Q. (2016). Deep quantization network for efficient image retrieval. In AAAI (pp. 3457–3463).

  • Cao, Z., Long, M., Wang, J., & Yu, P. S. (2017). Hashnet: Deep learning to hash by continuation. In ICCV (pp. 5609–5618).

  • Chua, T. S., Tang, J., Hong, R., Li, H., Luo, Z., & Zheng, Y. (2009). Nus-wide: A real-world web image database from National University of Singapore. In Proceedings of the ACM international conference on image and video retrieval (p. 48). ACM.

  • Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In ICML (pp. 160–167). ACM.

  • Dai, B., Guo, R., Kumar, S., He, N., & Song, L. (2017). Stochastic generative hashing. In ICML (pp. 913–922).

  • Datar, M., Immorlica, N., Indyk, P., & Mirrokni, V. S. (2004). Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on computational geometry.

  • Deng, L., Hinton, G., & Kingsbury, B. (2013). New types of deep neural network learning for speech recognition and related applications: An overview. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 8599–8603). IEEE.

  • Do, T. T., Doan, A. D., & Cheung, N. M. (2016). Learning to hash with binary deep neural network. In ECCV (pp. 219–234). Berlin: Springer.

  • Duan, Y., Lu, J., Wang, Z., Feng, J., & Zhou, J. (2017). Learning deep binary descriptor with multi-quantization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1183–1192).

  • Farvardin, N. (1994). Review of ’vector quantization and signal compression’ (gersho, a., and gray, r.m.; 1992). IEEE Transactions Information Theory, 40(1), 287.

    MathSciNet  Google Scholar 

  • Franzen, R. (1999). Kodak lossless true color image suite (Vol. 4). http://r0k.us/graphics/kodak.

  • Gao, L., Li, X., Song, J., & Shen, H. T. (2019). Hierarchical LSTMs with adaptive attention for visual captioning. IEEE Transactions on Pattern Analysis and Machine Intelligence,. https://doi.org/10.1109/TPAMI.2019.2894139.

    Article  Google Scholar 

  • Ge, T., He, K., & Sun, J. (2014). Graph cuts for supervised binary coding. In ECCV (pp. 250–264).

  • Girshick, R. (2015). Fast r-cnn. In Proceedings of The IEEE international conference on computer vision (pp. 1440–1448).

  • Gong, Y., Kumar, S., Verma, V., & Lazebnik, S. (2012). Angular quantization-based binary codes for fast similarity search. In NIPS (pp. 1205–1213).

  • Gong, Y., Lazebnik, S., Gordo, A., & Perronnin, F. (2013). Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12), 2916–2929.

    Article  Google Scholar 

  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial nets. In NIPS (pp. 2672–2680).

  • Google. (2017). Webp: Compression techniques. http://developers.google.com/speed/webp/docs/compression. Retrieved January 30, 2017 (1, 2, 5).

  • Grubb, G. (2008). Distributions and operators (Vol. 252). Berlin: Springer.

    MATH  Google Scholar 

  • Gu, Y., Ma, C., & Yang, J. (2016). Supervised recurrent hashing for large scale video retrieval. In ACM multimedia (pp. 272–276). ACM.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904–1916.

    Article  Google Scholar 

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).

  • He, T., Li, Y., Gao, L., Zhang, D., & Song, J. (2019). One network for multi-domains: Domain adaptive hashing with intersectant generative adversarial networks. In IJCAI (pp. 2477–2483).

  • Heo, J., Lee, Y., He, J., Chang, S., & Yoon, S. (2015). Spherical hashing: Binary code embedding with hyperspheres. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(11), 2304–2316.

    Article  Google Scholar 

  • Huiskes, M. J., & Lew, M. S. (2010). New trends and ideas in visual concept detection: The MIR Flickr retrieval evaluation initiative. In MIR ’10: Proceedings of the 2010 ACM international conference on multimedia information retrieval (pp. 527–536).

  • Irie, G., Li, Z., Wu, X., & Chang, S. (2014). Locally linear hashing for extracting non-linear manifolds. In CVPR (pp. 2123–2130).

  • Jégou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117–128.

    Article  Google Scholar 

  • Jin, Z., Hu, Y., Lin, Y., Zhang, D., Lin, S., Cai, D., et al. (2013). Complementary projection hashing. In ICCV (pp. 257–264).

  • Khosla, A., Jayadevaprakash, N., Yao, B., & Fei-Fei, L. (2011). Novel dataset for fine-grained image categorization. In First workshop on fine-grained visual categorization, IEEE conference on computer vision and pattern recognition. Colorado Springs, CO.

  • Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. CoRR arXiv:1412.6980.

  • Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. CoRR arXiv:1312.6114.

  • Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Master’s thesis, University of Toronto.

  • Lai, H., Pan, Y., Liu, Y., & Yan, S. (2015). Simultaneous feature learning and hash coding with deep neural networks. In CVPR (pp. 3270–3278).

  • Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2016). Autoencoding beyond pixels using a learned similarity metric. In ICML (pp. 1558–1566).

  • Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., et al. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In CVPR (pp. 105–114).

  • Li, M., Zuo, W., Gu, S., Zhao, D., & Zhang, D. (2018). Learning convolutional networks for content-weighted image compression. In CVPR.

  • Li, W., Wang, S., & Kang, W. (2016). Feature learning based deep supervised hashing with pairwise labels. In IJCAI (pp. 1711–1717).

  • Li, X., Lin, G., Shen, C., van den Hengel, A., & Dick, A. R. (2013). Learning hash functions using column generation. In ICML (pp. 142–150).

  • Lin, G., Shen, C., Shi, Q., van den Hengel, A., & Suter, D. (2014). Fast supervised hashing with decision trees for high-dimensional data. In CVPR (pp. 1971–1978).

  • Lin, G., Shen, C., Suter, D., & Van Den Hengel, A. (2013). A general two-step approach to learning-based hashing. In ICCV (pp. 2552–2559). IEEE.

  • Lin, K., Lu, J., Chen, C., & Zhou, J. (2016). Learning compact binary descriptors with unsupervised deep neural networks. In CVPR (pp. 1183–1192).

  • Liong, V. E., Lu, J., Wang, G., Moulin, P., & Zhou, J. (2015). Deep hashing for compact binary codes learning. In CVPR (pp. 2475–2483).

  • Liu, H., Wang, R., Shan, S., & Chen, X. (2016). Deep supervised hashing for fast image retrieval. In CVPR (pp. 2064–2072).

  • Liu, L., Shao, L., Shen, F., & Yu, M. (2017). Discretely coding semantic rank orders for supervised image hashing. In CVPR (pp. 5140–5149).

  • Liu, W., Wang, J., Ji, R., Jiang, Y., & Chang, S. (2012). Supervised hashing with kernels. In CVPR (pp. 2074–2081).

  • Liu, X., He, J., Deng, C., & Lang, B. (2014). Collaborative hashing. In CVPR (pp. 2147–2154).

  • Long, M., & Wang, J. (2015). Learning multiple tasks with deep relationship networks (Vol. 2). arXiv preprint arXiv:1506.02117.

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Lu, Y., Kumar, A., Zhai, S., Cheng, Y., Javidi, T., & Feris, R. (2017). Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5334–5343).

  • Misra, I., Shrivastava, A., Gupta, A., & Hebert, M. (2016). Cross-stitch networks for multi-task learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3994–4003).

  • Nemirovski, A., Juditsky, A., Lan, G., & Shapiro, A. (2009). Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4), 1574–1609.

    Article  MathSciNet  MATH  Google Scholar 

  • Nilsback, M. & Zisserman, A. (2006). A visual vocabulary for flower classification. In CVPR (pp. 1447–1454).

  • Norouzi, M., & Blei, D. M. (2011). Minimal loss hashing for compact binary codes. In ICML (pp. 353–360).

  • Norouzi, M., & Fleet, D. J. (2013). Cartesian k-means. In CVPR (pp. 3017–3024).

  • Odena, A., Dumoulin, V., & Olah, C. (2016). Deconvolution and checkerboard artifacts. Distill, 1(10), e3.

    Article  Google Scholar 

  • Rabbani, M., & Joshi, R. L. (2002). An overview of the JPEG 2000 still image compression standard. Signal Processing: Image Communication, 17(1), 3–48.

    Google Scholar 

  • Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.

  • Rippel, O., & Bourdev, L. (2017). Real-time adaptive image compression. In Proceedings of the 34th international conference on machine learning (Vol. 70, pp. 2922–2930). JMLR.org.

  • Ruder, S. (2017). An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098.

  • Shakhnarovich, G. (2005). Learning task-specific similarity. Ph.D. thesis, Massachusetts Institute of Technology.

  • Shannon, C. E. (2001). A mathematical theory of communication. Mobile Computing and Communications Review, 5(1), 3–55.

    Article  MathSciNet  Google Scholar 

  • Shen, F., Mu, Y., Yang, Y., Liu, W., Liu, L., Song, J., et al. (2017). Classification by retrieval: Binarizing data and classifiers. In SIGIR (pp. 595–604).

  • Shen, F., Shen, C., Liu, W., & Shen, H. T. (2015). Supervised discrete hashing. In CVPR (pp. 37–45).

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR arXiv:1409.1556.

  • Song, J., He, T., Gao, L., Xu, X., Hanjalic, A., & Shen, H. T. (2018). Binary generative adversarial networks for image retrieval. In AAAI.

  • Song, J., Guo, Y., Gao, L., Li, X., Hanjalic, A., & Shen, H. T. (2019). From deterministic to generative: Multimodal stochastic RNNS for video captioning. IEEE Transactions on Neural Networks and Learning Systems, 30(10), 3047–3058.

    Article  Google Scholar 

  • Song, J., Yang, Y., Yang, Y., Huang, Z., & Shen, H. T. (2013). Inter-media hashing for large-scale retrieval from heterogeneous data sources. In SIGMOD (pp. 785–796).

  • Strecha, C., Bronstein, A., Bronstein, M., & Fua, P. (2011). Ldahash: Improved matching with smaller descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(1), 66–78.

    Article  Google Scholar 

  • Theis, L., Shi, W., Cunningham, A., & Huszár, F. (2017). Lossy image compression with compressive autoencoders. In ICLR.

  • Toderici, G., O’Malley, S. M., Hwang, S. J., Vincent, D., Minnen, D., Baluja, S., et al. (2015). Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085.

  • Toderici, G., Vincent, D., Johnston, N., Hwang, S. J., Minnen, D., Shor, J., et al. (2016). Full resolution image compression with recurrent neural networks. CoRR arXiv:1608.05148.

  • Toderici, G., Vincent, D., Johnston, N., Hwang, S.J., Minnen, D., Shor, J., et al. (2017). Full resolution image compression with recurrent neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5306–5314).

  • Wallace, G. K. (1991). The JPEG still picture compression standard. Communications of the ACM, 34(4), 30–44.

    Article  Google Scholar 

  • Wang, B., Yang, Y., Xu, X., Hanjalic, A., & Shen, H. T. (2017). Adversarial cross-modal retrieval. In Proceedings of the 2017 ACM on multimedia conference (pp. 154–162).

  • Wang, J., Kumar, S., & Chang, S. (2010). Sequential projection learning for hashing with compact codes. In ICML (pp. 1127–1134).

  • Wang, J., Liu, W., Sun, A. X., & Jiang, Y. (2013a). Learning hash codes with listwise supervision. In ICCV (pp. 3032–3039).

  • Wang, J., Wang, J., Yu, N., & Li, S. (2013b). Order preserving hashing for approximate nearest neighbor search. In ACM multimedia (pp. 133–142). ACM.

  • Wang, J., Zhang, T., Song, J., Sebe, N., & Shen, H. T. (2018). A survey on learning to hash. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 769–790.

    Article  Google Scholar 

  • Wang, X., Shi, Y., & Kitani, K. M. (2016). Deep supervised hashing with triplet labels. In Asian conference on computer vision (pp. 70–84). Berlin: Springer.

  • Wang, Z., Simoncelli, E. P., & Bovik, A. C. (2003). Multiscale structural similarity for image quality assessment. In Conference record of the thirty-seventh asilomar conference on signals, systems and computers, 2004 (Vol. 2, pp. 1398–1402).

  • Weiss, Y., Torralba, A., & Fergus, R. (2008). Spectral hashing. In NIPS (pp. 1753–1760).

  • Wintz, P. A. (1972). Transform picture coding. Proceedings of the IEEE, 60(7), 809–820.

    Article  Google Scholar 

  • Xia, R., Pan, Y., Lai, H., Liu, C., & Yan, S. (2014a). Supervised hashing for image retrieval via image representation learning. In AAAI (pp. 2156–2162).

  • Xia, R., Pan, Y., Lai, H., Liu, C., & Yan, S. (2014b). Supervised hashing for image retrieval via image representation learning. In AAAI (Vol. 1, p. 2).

  • Yuan, X., Ren, L., Lu, J., & Zhou, J. (2018). Relaxation-free deep hashing via policy gradient. In The European conference on computer vision (ECCV).

  • Zhang, P., Zhang, W., Li, W., & Guo, M. (2014). Supervised hashing with latent factor models. In SIGIR (pp. 173–182).

  • Zhao, F., Huang, Y., Wang, L., & Tan, T. (2015). Deep semantic ranking based hashing for multi-label image retrieval. In CVPR (pp. 1556–1564). IEEE.

  • Zhu, H., Long, M., Wang, J., & Cao, Y. (2016). Deep hashing network for efficient similarity retrieval. In AAAI (pp. 2415–2421).

  • Zieba, M., Semberecki, P., El-Gaaly, T., & Trzcinski, T. (2018). Bingan: Learning compact binary descriptors with a regularized GAN. In Advances in neural information processing systems (pp. 3608–3618).

Download references

Acknowledgements

This work is supported by the Fundamental Research Funds for the Central Universities (Grant No. ZYGX2019J073), the National Natural Science Foundation of China (Grant No. 61772116, No. 61872064, No.61632007, No. 61602049), The Open Project of Zhejiang Lab (Grant No.2019KD0AB05).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Lianli Gao or Heng Tao Shen.

Additional information

Communicated by Li Liu, Matti Pietikäinen, Jie Qin, Jie Chen, Wanli Ouyang, Luc Van Gool.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, J., He, T., Gao, L. et al. Unified Binary Generative Adversarial Network for Image Retrieval and Compression. Int J Comput Vis 128, 2243–2264 (2020). https://doi.org/10.1007/s11263-020-01305-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-020-01305-2

Keywords

Navigation