Unified Binary Generative Adversarial Network for Image Retrieval and Compression

Song, Jingkuan; He, Tao; Gao, Lianli; Xu, Xing; Hanjalic, Alan; Shen, Heng Tao

doi:10.1007/s11263-020-01305-2

Unified Binary Generative Adversarial Network for Image Retrieval and Compression

Published: 18 February 2020

Volume 128, pages 2243–2264, (2020)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Jingkuan Song ORCID: orcid.org/0000-0002-2549-8322¹,
Tao He²,
Lianli Gao¹,
Xing Xu¹,
Alan Hanjalic³ &
…
Heng Tao Shen¹

2096 Accesses
48 Citations
Explore all metrics

Abstract

Binary codes have often been deployed to facilitate large-scale retrieval tasks, but not that often for image compression. In this paper, we propose a unified framework, BGAN+, that restricts the input noise variable of generative adversarial networks to be binary and conditioned on the features of each input image, and simultaneously learns two binary representations per image: one for image retrieval and the other serving as image compression. Compared to related methods that attempt to learn a single binary code serving both purposes, we demonstrate that choosing for two codes leads to more effective representations due to less concessions needed when balancing the requirements. The added value of using a unified framework compared to two separate frameworks lies in the synergy in data representation that is beneficial for both learning processes. When devising this framework, we also address another challenge in learning binary codes, namely that of learning supervision. While the most striking successes in image retrieval using binary codes have mostly involved discriminative models requiring labels, the proposed BGAN+ framework learns the binary codes in an unsupervised fashion, yet more effectively than the state-of-the-art supervised approaches. The proposed BGAN+ framework is evaluated on three benchmark datasets for image retrieval and two datasets on image compression. The experimental results show that BGAN+ outperforms the existing retrieval methods with significant margins and achieves promising performance for image compression, especially for low bit rates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Unsupervised Binary Representation Learning with Deep Variational Networks

Article 21 February 2019

Asymmetric hashing based on generative adversarial network

Article 07 June 2022

Unsupervised adversarial image retrieval

Article 27 November 2021

References

Agustsson, E., Mentzer, F., Tschannen, M., Cavigelli, L., Timofte, R., Benini, L., et al. (2017). Soft-to-hard vector quantization for end-to-end learned compression of images and neural networks. CoRR arXiv:1704.00648.
Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In Advances in neural information processing systems (pp. 41–48).
Baig, M. H., Koltun, V., & Torresani, L. (2017). Learning to inpaint for image compression. In NIPS (pp. 1246–1255).
Ballé, J., Laparra, V., & Simoncelli, E. P. (2016). End-to-end optimized image compression. CoRR arXiv:1611.01704.
Bellard, M. (2017). BPG image format. http://bellard.org/bpg/. Retrieved January 30, 2017 (1, 2).
Brown, M., Hua, G., & Winder, S. (2010). Discriminative learning of local image descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 43–57.
Article Google Scholar
Cao, Y., Liu, B., Long, M., & Wang, J. (2018). Hashgan: Deep learning to hash with pair conditional wasserstein GAN. In CVPR (pp. 1287–1296).
Cao, Y., Long, M., Wang, J., Zhu, H., & Wen, Q. (2016). Deep quantization network for efficient image retrieval. In AAAI (pp. 3457–3463).
Cao, Z., Long, M., Wang, J., & Yu, P. S. (2017). Hashnet: Deep learning to hash by continuation. In ICCV (pp. 5609–5618).
Chua, T. S., Tang, J., Hong, R., Li, H., Luo, Z., & Zheng, Y. (2009). Nus-wide: A real-world web image database from National University of Singapore. In Proceedings of the ACM international conference on image and video retrieval (p. 48). ACM.
Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In ICML (pp. 160–167). ACM.
Dai, B., Guo, R., Kumar, S., He, N., & Song, L. (2017). Stochastic generative hashing. In ICML (pp. 913–922).
Datar, M., Immorlica, N., Indyk, P., & Mirrokni, V. S. (2004). Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on computational geometry.
Deng, L., Hinton, G., & Kingsbury, B. (2013). New types of deep neural network learning for speech recognition and related applications: An overview. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 8599–8603). IEEE.
Do, T. T., Doan, A. D., & Cheung, N. M. (2016). Learning to hash with binary deep neural network. In ECCV (pp. 219–234). Berlin: Springer.
Duan, Y., Lu, J., Wang, Z., Feng, J., & Zhou, J. (2017). Learning deep binary descriptor with multi-quantization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1183–1192).
Farvardin, N. (1994). Review of ’vector quantization and signal compression’ (gersho, a., and gray, r.m.; 1992). IEEE Transactions Information Theory, 40(1), 287.
MathSciNet Google Scholar
Franzen, R. (1999). Kodak lossless true color image suite (Vol. 4). http://r0k.us/graphics/kodak.
Gao, L., Li, X., Song, J., & Shen, H. T. (2019). Hierarchical LSTMs with adaptive attention for visual captioning. IEEE Transactions on Pattern Analysis and Machine Intelligence,. https://doi.org/10.1109/TPAMI.2019.2894139.
Article Google Scholar
Ge, T., He, K., & Sun, J. (2014). Graph cuts for supervised binary coding. In ECCV (pp. 250–264).
Girshick, R. (2015). Fast r-cnn. In Proceedings of The IEEE international conference on computer vision (pp. 1440–1448).
Gong, Y., Kumar, S., Verma, V., & Lazebnik, S. (2012). Angular quantization-based binary codes for fast similarity search. In NIPS (pp. 1205–1213).
Gong, Y., Lazebnik, S., Gordo, A., & Perronnin, F. (2013). Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12), 2916–2929.
Article Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial nets. In NIPS (pp. 2672–2680).
Google. (2017). Webp: Compression techniques. http://developers.google.com/speed/webp/docs/compression. Retrieved January 30, 2017 (1, 2, 5).
Grubb, G. (2008). Distributions and operators (Vol. 252). Berlin: Springer.
MATH Google Scholar
Gu, Y., Ma, C., & Yang, J. (2016). Supervised recurrent hashing for large scale video retrieval. In ACM multimedia (pp. 272–276). ACM.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904–1916.
Article Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).
He, T., Li, Y., Gao, L., Zhang, D., & Song, J. (2019). One network for multi-domains: Domain adaptive hashing with intersectant generative adversarial networks. In IJCAI (pp. 2477–2483).
Heo, J., Lee, Y., He, J., Chang, S., & Yoon, S. (2015). Spherical hashing: Binary code embedding with hyperspheres. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(11), 2304–2316.
Article Google Scholar
Huiskes, M. J., & Lew, M. S. (2010). New trends and ideas in visual concept detection: The MIR Flickr retrieval evaluation initiative. In MIR ’10: Proceedings of the 2010 ACM international conference on multimedia information retrieval (pp. 527–536).
Irie, G., Li, Z., Wu, X., & Chang, S. (2014). Locally linear hashing for extracting non-linear manifolds. In CVPR (pp. 2123–2130).
Jégou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117–128.
Article Google Scholar
Jin, Z., Hu, Y., Lin, Y., Zhang, D., Lin, S., Cai, D., et al. (2013). Complementary projection hashing. In ICCV (pp. 257–264).
Khosla, A., Jayadevaprakash, N., Yao, B., & Fei-Fei, L. (2011). Novel dataset for fine-grained image categorization. In First workshop on fine-grained visual categorization, IEEE conference on computer vision and pattern recognition. Colorado Springs, CO.
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. CoRR arXiv:1412.6980.
Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. CoRR arXiv:1312.6114.
Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Master’s thesis, University of Toronto.
Lai, H., Pan, Y., Liu, Y., & Yan, S. (2015). Simultaneous feature learning and hash coding with deep neural networks. In CVPR (pp. 3270–3278).
Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2016). Autoencoding beyond pixels using a learned similarity metric. In ICML (pp. 1558–1566).
Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., et al. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In CVPR (pp. 105–114).
Li, M., Zuo, W., Gu, S., Zhao, D., & Zhang, D. (2018). Learning convolutional networks for content-weighted image compression. In CVPR.
Li, W., Wang, S., & Kang, W. (2016). Feature learning based deep supervised hashing with pairwise labels. In IJCAI (pp. 1711–1717).
Li, X., Lin, G., Shen, C., van den Hengel, A., & Dick, A. R. (2013). Learning hash functions using column generation. In ICML (pp. 142–150).
Lin, G., Shen, C., Shi, Q., van den Hengel, A., & Suter, D. (2014). Fast supervised hashing with decision trees for high-dimensional data. In CVPR (pp. 1971–1978).
Lin, G., Shen, C., Suter, D., & Van Den Hengel, A. (2013). A general two-step approach to learning-based hashing. In ICCV (pp. 2552–2559). IEEE.
Lin, K., Lu, J., Chen, C., & Zhou, J. (2016). Learning compact binary descriptors with unsupervised deep neural networks. In CVPR (pp. 1183–1192).
Liong, V. E., Lu, J., Wang, G., Moulin, P., & Zhou, J. (2015). Deep hashing for compact binary codes learning. In CVPR (pp. 2475–2483).
Liu, H., Wang, R., Shan, S., & Chen, X. (2016). Deep supervised hashing for fast image retrieval. In CVPR (pp. 2064–2072).
Liu, L., Shao, L., Shen, F., & Yu, M. (2017). Discretely coding semantic rank orders for supervised image hashing. In CVPR (pp. 5140–5149).
Liu, W., Wang, J., Ji, R., Jiang, Y., & Chang, S. (2012). Supervised hashing with kernels. In CVPR (pp. 2074–2081).
Liu, X., He, J., Deng, C., & Lang, B. (2014). Collaborative hashing. In CVPR (pp. 2147–2154).
Long, M., & Wang, J. (2015). Learning multiple tasks with deep relationship networks (Vol. 2). arXiv preprint arXiv:1506.02117.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Lu, Y., Kumar, A., Zhai, S., Cheng, Y., Javidi, T., & Feris, R. (2017). Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5334–5343).
Misra, I., Shrivastava, A., Gupta, A., & Hebert, M. (2016). Cross-stitch networks for multi-task learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3994–4003).
Nemirovski, A., Juditsky, A., Lan, G., & Shapiro, A. (2009). Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4), 1574–1609.
Article MathSciNet MATH Google Scholar
Nilsback, M. & Zisserman, A. (2006). A visual vocabulary for flower classification. In CVPR (pp. 1447–1454).
Norouzi, M., & Blei, D. M. (2011). Minimal loss hashing for compact binary codes. In ICML (pp. 353–360).
Norouzi, M., & Fleet, D. J. (2013). Cartesian k-means. In CVPR (pp. 3017–3024).
Odena, A., Dumoulin, V., & Olah, C. (2016). Deconvolution and checkerboard artifacts. Distill, 1(10), e3.
Article Google Scholar
Rabbani, M., & Joshi, R. L. (2002). An overview of the JPEG 2000 still image compression standard. Signal Processing: Image Communication, 17(1), 3–48.
Google Scholar
Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
Rippel, O., & Bourdev, L. (2017). Real-time adaptive image compression. In Proceedings of the 34th international conference on machine learning (Vol. 70, pp. 2922–2930). JMLR.org.
Ruder, S. (2017). An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098.
Shakhnarovich, G. (2005). Learning task-specific similarity. Ph.D. thesis, Massachusetts Institute of Technology.
Shannon, C. E. (2001). A mathematical theory of communication. Mobile Computing and Communications Review, 5(1), 3–55.
Article MathSciNet Google Scholar
Shen, F., Mu, Y., Yang, Y., Liu, W., Liu, L., Song, J., et al. (2017). Classification by retrieval: Binarizing data and classifiers. In SIGIR (pp. 595–604).
Shen, F., Shen, C., Liu, W., & Shen, H. T. (2015). Supervised discrete hashing. In CVPR (pp. 37–45).
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR arXiv:1409.1556.
Song, J., He, T., Gao, L., Xu, X., Hanjalic, A., & Shen, H. T. (2018). Binary generative adversarial networks for image retrieval. In AAAI.
Song, J., Guo, Y., Gao, L., Li, X., Hanjalic, A., & Shen, H. T. (2019). From deterministic to generative: Multimodal stochastic RNNS for video captioning. IEEE Transactions on Neural Networks and Learning Systems, 30(10), 3047–3058.
Article Google Scholar
Song, J., Yang, Y., Yang, Y., Huang, Z., & Shen, H. T. (2013). Inter-media hashing for large-scale retrieval from heterogeneous data sources. In SIGMOD (pp. 785–796).
Strecha, C., Bronstein, A., Bronstein, M., & Fua, P. (2011). Ldahash: Improved matching with smaller descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(1), 66–78.
Article Google Scholar
Theis, L., Shi, W., Cunningham, A., & Huszár, F. (2017). Lossy image compression with compressive autoencoders. In ICLR.
Toderici, G., O’Malley, S. M., Hwang, S. J., Vincent, D., Minnen, D., Baluja, S., et al. (2015). Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085.
Toderici, G., Vincent, D., Johnston, N., Hwang, S. J., Minnen, D., Shor, J., et al. (2016). Full resolution image compression with recurrent neural networks. CoRR arXiv:1608.05148.
Toderici, G., Vincent, D., Johnston, N., Hwang, S.J., Minnen, D., Shor, J., et al. (2017). Full resolution image compression with recurrent neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5306–5314).
Wallace, G. K. (1991). The JPEG still picture compression standard. Communications of the ACM, 34(4), 30–44.
Article Google Scholar
Wang, B., Yang, Y., Xu, X., Hanjalic, A., & Shen, H. T. (2017). Adversarial cross-modal retrieval. In Proceedings of the 2017 ACM on multimedia conference (pp. 154–162).
Wang, J., Kumar, S., & Chang, S. (2010). Sequential projection learning for hashing with compact codes. In ICML (pp. 1127–1134).
Wang, J., Liu, W., Sun, A. X., & Jiang, Y. (2013a). Learning hash codes with listwise supervision. In ICCV (pp. 3032–3039).
Wang, J., Wang, J., Yu, N., & Li, S. (2013b). Order preserving hashing for approximate nearest neighbor search. In ACM multimedia (pp. 133–142). ACM.
Wang, J., Zhang, T., Song, J., Sebe, N., & Shen, H. T. (2018). A survey on learning to hash. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 769–790.
Article Google Scholar
Wang, X., Shi, Y., & Kitani, K. M. (2016). Deep supervised hashing with triplet labels. In Asian conference on computer vision (pp. 70–84). Berlin: Springer.
Wang, Z., Simoncelli, E. P., & Bovik, A. C. (2003). Multiscale structural similarity for image quality assessment. In Conference record of the thirty-seventh asilomar conference on signals, systems and computers, 2004 (Vol. 2, pp. 1398–1402).
Weiss, Y., Torralba, A., & Fergus, R. (2008). Spectral hashing. In NIPS (pp. 1753–1760).
Wintz, P. A. (1972). Transform picture coding. Proceedings of the IEEE, 60(7), 809–820.
Article Google Scholar
Xia, R., Pan, Y., Lai, H., Liu, C., & Yan, S. (2014a). Supervised hashing for image retrieval via image representation learning. In AAAI (pp. 2156–2162).
Xia, R., Pan, Y., Lai, H., Liu, C., & Yan, S. (2014b). Supervised hashing for image retrieval via image representation learning. In AAAI (Vol. 1, p. 2).
Yuan, X., Ren, L., Lu, J., & Zhou, J. (2018). Relaxation-free deep hashing via policy gradient. In The European conference on computer vision (ECCV).
Zhang, P., Zhang, W., Li, W., & Guo, M. (2014). Supervised hashing with latent factor models. In SIGIR (pp. 173–182).
Zhao, F., Huang, Y., Wang, L., & Tan, T. (2015). Deep semantic ranking based hashing for multi-label image retrieval. In CVPR (pp. 1556–1564). IEEE.
Zhu, H., Long, M., Wang, J., & Cao, Y. (2016). Deep hashing network for efficient similarity retrieval. In AAAI (pp. 2415–2421).
Zieba, M., Semberecki, P., El-Gaaly, T., & Trzcinski, T. (2018). Bingan: Learning compact binary descriptors with a regularized GAN. In Advances in neural information processing systems (pp. 3608–3618).

Download references

Acknowledgements

This work is supported by the Fundamental Research Funds for the Central Universities (Grant No. ZYGX2019J073), the National Natural Science Foundation of China (Grant No. 61772116, No. 61872064, No.61632007, No. 61602049), The Open Project of Zhejiang Lab (Grant No.2019KD0AB05).

Author information

Authors and Affiliations

Center for Future Media and School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, Sichuan, China
Jingkuan Song, Lianli Gao, Xing Xu & Heng Tao Shen
Monash University, Clayton, VIC, 3800, Australia
Tao He
Delft University of Technology, Delft, The Netherlands
Alan Hanjalic

Authors

Jingkuan Song
View author publications
You can also search for this author in PubMed Google Scholar
Tao He
View author publications
You can also search for this author in PubMed Google Scholar
Lianli Gao
View author publications
You can also search for this author in PubMed Google Scholar
Xing Xu
View author publications
You can also search for this author in PubMed Google Scholar
Alan Hanjalic
View author publications
You can also search for this author in PubMed Google Scholar
Heng Tao Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Lianli Gao or Heng Tao Shen.

Additional information

Communicated by Li Liu, Matti Pietikäinen, Jie Qin, Jie Chen, Wanli Ouyang, Luc Van Gool.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, J., He, T., Gao, L. et al. Unified Binary Generative Adversarial Network for Image Retrieval and Compression. Int J Comput Vis 128, 2243–2264 (2020). https://doi.org/10.1007/s11263-020-01305-2

Download citation

Received: 21 April 2019
Accepted: 05 February 2020
Published: 18 February 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11263-020-01305-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unified Binary Generative Adversarial Network for Image Retrieval and Compression

Abstract

Access this article

Similar content being viewed by others

Unsupervised Binary Representation Learning with Deep Variational Networks

Asymmetric hashing based on generative adversarial network

Unsupervised adversarial image retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unified Binary Generative Adversarial Network for Image Retrieval and Compression

Abstract

Access this article

Similar content being viewed by others

Unsupervised Binary Representation Learning with Deep Variational Networks

Asymmetric hashing based on generative adversarial network

Unsupervised adversarial image retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation