FRWCAE: joint faster-RCNN and Wasserstein convolutional auto-encoder for instance retrieval

Zhang, Yi-yang; Feng, Yong; Liu, Da-jiang; Shang, Jia-xing; Qiang, Bao-hua

doi:10.1007/s10489-019-01625-y

FRWCAE: joint faster-RCNN and Wasserstein convolutional auto-encoder for instance retrieval

Published: 02 March 2020

Volume 50, pages 2208–2221, (2020)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yi-yang Zhang^1,2,
Yong Feng^1,2,
Da-jiang Liu^1,2,
Jia-xing Shang^1,2 &
…
Bao-hua Qiang^3,4

444 Accesses
5 Citations
Explore all metrics

Abstract

Based on the powerful feature extraction capability of deep convolutional neural networks, image-level retrieval methods have achieved superior performance compared to the hand-crafted features and indexing algorithms. However, people tend to focus on foreground objects of interest in images. Locating objects accurately and using object-level features for retrieval become the essential tasks of instance search. In this work, we propose a novel instance retrieval method FRWACE, which combines the Faster R-CNN framework for object-level feature extraction with a brand-new Wasserstein Convolutional Auto-encoder for dimensionality reduction. In addition, we propose a considerate category-first spatial re-rank strategy to improve instance-level retrieval accuracy. Extensive experiments on four large datasets Oxford 5K, Paris 6K, Oxford 105K and Paris 106K show that our approach has achieved significant performance compared to the state-of-the-arts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

FSODv2: A Deep Calibrated Few-Shot Object Detection Network

Article 04 April 2024

Qi Fan, Wei Zhuo, … Yu-Wing Tai

Deep Learning for Generic Object Detection: A Survey

Article Open access 31 October 2019

Li Liu, Wanli Ouyang, … Matti Pietikäinen

References

Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Sivic, Zisserman (2003) Video google: a text retrieval approach to object matching in videos. In: 9th IEEE international conference on computer vision (ICCV 2003), 14-17 October 2003, Nice, France, vol 2. IEEE, pp 1470–1477
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE computer society conference on computer vision and pattern recognition (CVPR 2007), 18-23 June 2007, Minneapolis, Minnesota, USA. IEEE, pp 1–8
Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Computer vision - ECCV 2008, 10th European conference on computer vision, Marseille, France, October 12-18, 2008, Proceedings, Part I, vol 5302. Springer, pp 304–317
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE, pp 770–778
Ren S, He K, Girshick RB, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Wang Z, Wu X (2016) Salient object detection using biogeography-based optimization to combine features. Appl Intell 45(1):1–17
Article Google Scholar
Han X, Xiong X, Duan F (2015) A new method for image segmentation based on BP neural network and gravitational search algorithm enhanced by cat chaotic mapping. Appl Intell 43(4):855–873
Article Google Scholar
Arya R, Singh N, Agrawal RK (2017) A novel combination of second-order statistical features and segmentation using multi-layer superpixels for salient object detection. Appl Intell 46(2):254–271
Article Google Scholar
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE conference on computer vision and pattern recognition, CVPR Workshops 2014, Columbus, OH, USA, June 23-28, 2014. IEEE, pp 512–519
Babenko A, Slesarev A, Chigorin A, Lempitsky VS (2014) Neural codes for image retrieval. In: Computer vision - ECCV 2014 - 13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I, vol 8689. Springer, pp 584–599
Gordo A, Almazan J, Revaud J, Larlus D (2016) Deep image retrieval: Learning global representations for image search. In: Computer vision - ECCV 2016 - 14th European conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI, vol 9910, pp 241–257
Babenko A, Lempitsky VS (2015) Aggregating local deep convolutional features for image retrieval. In: 2015 IEEE international conference on computer vision (ICCV). IEEE, pp 1269–1277
Kalantidis Y, Mellina C, Osindero S (2016) Cross-dimensional weighting for aggregated deep convolutional features. In: Computer vision - ECCV 2016 workshops - Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part I, vol 9913, pp 685–701
Tolias G, Sicre R, Jégou H (2016) Particular object retrieval with integral max-pooling of CNN activations. In: 4th international conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings
Zagoruyko S, Komodakis N (2015) Learning to compare image patches via convolutional neural networks. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. IEEE, pp 4353–4361
Yingying Z, Jiong W, Lingxi X, Liang Z (2018) Attention-based pyramid aggregation network for visual place recognition. In: 2018 ACM multimedia conference on multimedia conference, MM 2018, Seoul, Republic of Korea, October 22-26, 2018. ACM, pp 99–107
Chen Z, Lin J, Chandrasekhar V, Duan L-Y (2018) Gated square-root pooling for image instance retrieval. In: 2018 IEEE International Conference on Image Processing, ICIP 2018, Athens, Greece, October 7-10, 2018, pp 1982–1986
Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 20th ACM symposium on computational geometry, Brooklyn, New York, USA, June 8-11, 2004. ACM Press, pp 253–262
Norouzi M, Blei DM (2011) Minimal loss hashing for compact binary codes. In: Proceedings of the 28th international conference on machine learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011. Omnipress, pp 353–360
Liu W, Wang J, Ji R, Jiang Y-G, Chang S-F (2012) Supervised hashing with kernels. In: 2012 IEEE conference on computer vision and pattern recognition, Providence, RI, USA, June 16-21, 2012. IEEE Computer Society, pp 2074–2081
Jégou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128
Article Google Scholar
Zhang T, Du C, Wang J (2014) Composite quantization for approximate nearest neighbor search. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, 21-26 June 2014. JMLR.org, pp 838–846
Pearson K (1901) Liii. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2(11):559–572
Article Google Scholar
Hotelling H (1932) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417–441
Article Google Scholar
van der Maaten L, Hinton GE (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605
MATH Google Scholar
Hinton GE, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet Google Scholar
Krizhevsky A, Hinton GE (2011) Using very deep autoencoders for content-based image retrieval. In: ESANN 2011, 19th European symposium on artificial neural networks, Bruges, Belgium, April 27-29, 2011, Proceedings
Wang Y, Yao H, Zhao S (2016) Auto-encoder based dimensionality reduction. Neurocomputing 184:232–242
Article Google Scholar
Li E, Du P, Samat A, Meng Y, Che M (2017) Mid-level feature representation via sparse autoencoder for remotely sensed scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 10(3):1068–1081
Article Google Scholar
Luo W, Li J, Yang J, Xu W, Zhang J (2017) Convolutional sparse autoencoders for image classification. IEEE Trans Neural Netw Learning Sys 29(7):3289–3294
MathSciNet Google Scholar
Ribeiro M, Lazzaretti AE, Lopes HS (2018) A study of deep convolutional auto-encoders for anomaly detection in videos. Pattern Recogn Lett 105:13–22
Article Google Scholar
Zhou C, Paffenroth RC (2017) Anomaly detection with robust deep autoencoders. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, Halifax, NS, Canada, August 13 - 17, 2017. ACM Press, pp 665–674
Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: 2nd international conference on learning representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings
Mescheder LM, Nowozin S, Geiger A (2017) Adversarial variational bayes: unifying variational autoencoders and generative adversarial networks. In: Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, vol 70. PMLR, pp 2391–2400
Masci J, Meier U, Ciresan DC, Schmidhuber J (2011) Stacked convolutional auto-encoders for hierarchical feature extraction. In: Artificial neural networks and machine learning - ICANN 2011 - 21st International Conference on Artificial Neural Networks, Espoo, Finland, vol 6791. Springer, pp 52–59
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein gan. CoRR, arXiv:1701.07875
Villani C (2003) Topics in optimal transportation. Ams Graduate Studies in Mathematics: 370
Frogner C, Zhang C, Mobahi H, Araya-Polo M, Poggio TA (2015) Learning with a wasserstein loss. In: Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pp 2053–2061
Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: Computer vision - ECCV 2014 - 13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VII, pp 392–407
Yandex AB, Lempitsky V (2015) Aggregating local deep features for image retrieval. In: 2015 IEEE international conference on computer vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pp 1269–1277
Van Der Maaten L, Postma E, Van den Herik J (2007) Dimensionality reduction: a comparative review. J Mach Learn Res - JMLR 10(66-71):13
Google Scholar
Jégou H, Chum O (2012) Negative evidences and co-occurences in image retrieval: the benefit of PCA and whitening. In: Computer vision - ECCV 2012 - 12th European conference on computer vision, Florence, Italy, October 7-13, 2012, Proceedings, Part II. Springer, Berlin, pp 774–787
Belarbi MA, Mahmoudi S, Belalem G (2017) PCA as dimensionality reduction for large-scale image retrieval systems. Int J Ambient Comput Intell (IJACI) 8(4):45–58
Article Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Ghodrati H, Hamza AB (2017) Nonrigid 3d shape retrieval using deep auto-encoders. Appl Intell 47 (1):44–61
Article Google Scholar
Pulgar FJ, Charte F, Rivera AJ, del Jesús MJ (2018) Aeknn: An autoencoder knn-based classifier with built-in dimensionality reduction. Int J Comput Intell Sys 12(1):436–452
Article Google Scholar
Tolstikhin I, Bousquet O, Gelly S, Schoelkopf B (2017) Wasserstein auto-encoders. In: 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada. OpenReview.net
Salvador A, Giro-I-Nieto X, Marques F, Satoh S (2016) Faster r-CNN features for instance search. In: 2016 IEEE conference on computer vision and pattern recognition workshops, CVPR workshops 2016, Las Vegas, NV, USA, June 26 - July 1, 2016. IEEE Computer Society, pp 394–401
Bousquet O, Gelly S, Tolstikhin I, Simon-Gabriel C-J, Schoelkopf B (2017) From optimal transport to generative modeling: the VEGAN cookbook. CoRR, arXiv:1705.07642
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: Improving particular object retrieval in large scale image databases. In: 2008 IEEE computer society conference on computer vision and pattern recognition (CVPR 2008), 24-26 June 2008, Anchorage, Alaska, USA. IEEE Computer Society , pp 1–8
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE computer society conference on computer vision and pattern recognition (CVPR 2007), 18-23 June 2007, Minneapolis, Minnesota, USA. IEEE Computer Society, pp 1–8
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR.org, pp 448–456
Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings

Download references

Acknowledgements

Supported by National Key R&D Program of China (No. 2017YFB1402400), National Nature Science Foundation of China (No. 61762025), Guangxi Key Laboratory of Trusted Software (No.kx201701), Guangxi Key Laboratory of Optoelectroric Information Processing (No. GD18202), and Frontier and Application Foundation Research Program of CQ CSTC (No. cstc2017jcyjAX0340).

Author information

Authors and Affiliations

College of Computer Science, Chongqing University, Chongqing, 400030, China
Yi-yang Zhang, Yong Feng, Da-jiang Liu & Jia-xing Shang
Key Laboratory of Dependable Service Computing in Cyber Physical Society, Ministry of Education, Chongqing, 400030, China
Yi-yang Zhang, Yong Feng, Da-jiang Liu & Jia-xing Shang
Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, 541004, China
Bao-hua Qiang
Guangxi Key Laboratory of Optoelectronic Information Processing, Guilin University of Electronic Technology, Guilin, 541004, China
Bao-hua Qiang

Authors

Yi-yang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Feng
View author publications
You can also search for this author in PubMed Google Scholar
Da-jiang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jia-xing Shang
View author publications
You can also search for this author in PubMed Google Scholar
Bao-hua Qiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Feng.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Yy., Feng, Y., Liu, Dj. et al. FRWCAE: joint faster-RCNN and Wasserstein convolutional auto-encoder for instance retrieval. Appl Intell 50, 2208–2221 (2020). https://doi.org/10.1007/s10489-019-01625-y

Download citation

Received: 06 May 2019
Revised: 04 December 2019
Accepted: 25 December 2019
Published: 02 March 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s10489-019-01625-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FRWCAE: joint faster-RCNN and Wasserstein convolutional auto-encoder for instance retrieval

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

FSODv2: A Deep Calibrated Few-Shot Object Detection Network

Deep Learning for Generic Object Detection: A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

FRWCAE: joint faster-RCNN and Wasserstein convolutional auto-encoder for instance retrieval

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

FSODv2: A Deep Calibrated Few-Shot Object Detection Network

Deep Learning for Generic Object Detection: A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation