Skip to main content
Log in

FRWCAE: joint faster-RCNN and Wasserstein convolutional auto-encoder for instance retrieval

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Based on the powerful feature extraction capability of deep convolutional neural networks, image-level retrieval methods have achieved superior performance compared to the hand-crafted features and indexing algorithms. However, people tend to focus on foreground objects of interest in images. Locating objects accurately and using object-level features for retrieval become the essential tasks of instance search. In this work, we propose a novel instance retrieval method FRWACE, which combines the Faster R-CNN framework for object-level feature extraction with a brand-new Wasserstein Convolutional Auto-encoder for dimensionality reduction. In addition, we propose a considerate category-first spatial re-rank strategy to improve instance-level retrieval accuracy. Extensive experiments on four large datasets Oxford 5K, Paris 6K, Oxford 105K and Paris 106K show that our approach has achieved significant performance compared to the state-of-the-arts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  2. Sivic, Zisserman (2003) Video google: a text retrieval approach to object matching in videos. In: 9th IEEE international conference on computer vision (ICCV 2003), 14-17 October 2003, Nice, France, vol 2. IEEE, pp 1470–1477

  3. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE computer society conference on computer vision and pattern recognition (CVPR 2007), 18-23 June 2007, Minneapolis, Minnesota, USA. IEEE, pp 1–8

  4. Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Computer vision - ECCV 2008, 10th European conference on computer vision, Marseille, France, October 12-18, 2008, Proceedings, Part I, vol 5302. Springer, pp 304–317

  5. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE, pp 770–778

  6. Ren S, He K, Girshick RB, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  7. Wang Z, Wu X (2016) Salient object detection using biogeography-based optimization to combine features. Appl Intell 45(1):1–17

    Article  Google Scholar 

  8. Han X, Xiong X, Duan F (2015) A new method for image segmentation based on BP neural network and gravitational search algorithm enhanced by cat chaotic mapping. Appl Intell 43(4):855–873

    Article  Google Scholar 

  9. Arya R, Singh N, Agrawal RK (2017) A novel combination of second-order statistical features and segmentation using multi-layer superpixels for salient object detection. Appl Intell 46(2):254–271

    Article  Google Scholar 

  10. Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE conference on computer vision and pattern recognition, CVPR Workshops 2014, Columbus, OH, USA, June 23-28, 2014. IEEE, pp 512–519

  11. Babenko A, Slesarev A, Chigorin A, Lempitsky VS (2014) Neural codes for image retrieval. In: Computer vision - ECCV 2014 - 13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I, vol 8689. Springer, pp 584–599

  12. Gordo A, Almazan J, Revaud J, Larlus D (2016) Deep image retrieval: Learning global representations for image search. In: Computer vision - ECCV 2016 - 14th European conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI, vol 9910, pp 241–257

  13. Babenko A, Lempitsky VS (2015) Aggregating local deep convolutional features for image retrieval. In: 2015 IEEE international conference on computer vision (ICCV). IEEE, pp 1269–1277

  14. Kalantidis Y, Mellina C, Osindero S (2016) Cross-dimensional weighting for aggregated deep convolutional features. In: Computer vision - ECCV 2016 workshops - Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part I, vol 9913, pp 685–701

  15. Tolias G, Sicre R, Jégou H (2016) Particular object retrieval with integral max-pooling of CNN activations. In: 4th international conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings

  16. Zagoruyko S, Komodakis N (2015) Learning to compare image patches via convolutional neural networks. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. IEEE, pp 4353–4361

  17. Yingying Z, Jiong W, Lingxi X, Liang Z (2018) Attention-based pyramid aggregation network for visual place recognition. In: 2018 ACM multimedia conference on multimedia conference, MM 2018, Seoul, Republic of Korea, October 22-26, 2018. ACM, pp 99–107

  18. Chen Z, Lin J, Chandrasekhar V, Duan L-Y (2018) Gated square-root pooling for image instance retrieval. In: 2018 IEEE International Conference on Image Processing, ICIP 2018, Athens, Greece, October 7-10, 2018, pp 1982–1986

  19. Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 20th ACM symposium on computational geometry, Brooklyn, New York, USA, June 8-11, 2004. ACM Press, pp 253–262

  20. Norouzi M, Blei DM (2011) Minimal loss hashing for compact binary codes. In: Proceedings of the 28th international conference on machine learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011. Omnipress, pp 353–360

  21. Liu W, Wang J, Ji R, Jiang Y-G, Chang S-F (2012) Supervised hashing with kernels. In: 2012 IEEE conference on computer vision and pattern recognition, Providence, RI, USA, June 16-21, 2012. IEEE Computer Society, pp 2074–2081

  22. Jégou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128

    Article  Google Scholar 

  23. Zhang T, Du C, Wang J (2014) Composite quantization for approximate nearest neighbor search. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, 21-26 June 2014. JMLR.org, pp 838–846

  24. Pearson K (1901) Liii. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2(11):559–572

    Article  Google Scholar 

  25. Hotelling H (1932) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417–441

    Article  Google Scholar 

  26. van der Maaten L, Hinton GE (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

  27. Hinton GE, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  Google Scholar 

  28. Krizhevsky A, Hinton GE (2011) Using very deep autoencoders for content-based image retrieval. In: ESANN 2011, 19th European symposium on artificial neural networks, Bruges, Belgium, April 27-29, 2011, Proceedings

  29. Wang Y, Yao H, Zhao S (2016) Auto-encoder based dimensionality reduction. Neurocomputing 184:232–242

    Article  Google Scholar 

  30. Li E, Du P, Samat A, Meng Y, Che M (2017) Mid-level feature representation via sparse autoencoder for remotely sensed scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 10(3):1068–1081

    Article  Google Scholar 

  31. Luo W, Li J, Yang J, Xu W, Zhang J (2017) Convolutional sparse autoencoders for image classification. IEEE Trans Neural Netw Learning Sys 29(7):3289–3294

    MathSciNet  Google Scholar 

  32. Ribeiro M, Lazzaretti AE, Lopes HS (2018) A study of deep convolutional auto-encoders for anomaly detection in videos. Pattern Recogn Lett 105:13–22

    Article  Google Scholar 

  33. Zhou C, Paffenroth RC (2017) Anomaly detection with robust deep autoencoders. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, Halifax, NS, Canada, August 13 - 17, 2017. ACM Press, pp 665–674

  34. Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: 2nd international conference on learning representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings

  35. Mescheder LM, Nowozin S, Geiger A (2017) Adversarial variational bayes: unifying variational autoencoders and generative adversarial networks. In: Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, vol 70. PMLR, pp 2391–2400

  36. Masci J, Meier U, Ciresan DC, Schmidhuber J (2011) Stacked convolutional auto-encoders for hierarchical feature extraction. In: Artificial neural networks and machine learning - ICANN 2011 - 21st International Conference on Artificial Neural Networks, Espoo, Finland, vol 6791. Springer, pp 52–59

  37. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein gan. CoRR, arXiv:1701.07875

  38. Villani C (2003) Topics in optimal transportation. Ams Graduate Studies in Mathematics: 370

  39. Frogner C, Zhang C, Mobahi H, Araya-Polo M, Poggio TA (2015) Learning with a wasserstein loss. In: Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pp 2053–2061

  40. Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: Computer vision - ECCV 2014 - 13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VII, pp 392–407

  41. Yandex AB, Lempitsky V (2015) Aggregating local deep features for image retrieval. In: 2015 IEEE international conference on computer vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pp 1269–1277

  42. Van Der Maaten L, Postma E, Van den Herik J (2007) Dimensionality reduction: a comparative review. J Mach Learn Res - JMLR 10(66-71):13

    Google Scholar 

  43. Jégou H, Chum O (2012) Negative evidences and co-occurences in image retrieval: the benefit of PCA and whitening. In: Computer vision - ECCV 2012 - 12th European conference on computer vision, Florence, Italy, October 7-13, 2012, Proceedings, Part II. Springer, Berlin, pp 774–787

  44. Belarbi MA, Mahmoudi S, Belalem G (2017) PCA as dimensionality reduction for large-scale image retrieval systems. Int J Ambient Comput Intell (IJACI) 8(4):45–58

    Article  Google Scholar 

  45. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  46. Ghodrati H, Hamza AB (2017) Nonrigid 3d shape retrieval using deep auto-encoders. Appl Intell 47 (1):44–61

    Article  Google Scholar 

  47. Pulgar FJ, Charte F, Rivera AJ, del Jesús MJ (2018) Aeknn: An autoencoder knn-based classifier with built-in dimensionality reduction. Int J Comput Intell Sys 12(1):436–452

    Article  Google Scholar 

  48. Tolstikhin I, Bousquet O, Gelly S, Schoelkopf B (2017) Wasserstein auto-encoders. In: 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada. OpenReview.net

  49. Salvador A, Giro-I-Nieto X, Marques F, Satoh S (2016) Faster r-CNN features for instance search. In: 2016 IEEE conference on computer vision and pattern recognition workshops, CVPR workshops 2016, Las Vegas, NV, USA, June 26 - July 1, 2016. IEEE Computer Society, pp 394–401

  50. Bousquet O, Gelly S, Tolstikhin I, Simon-Gabriel C-J, Schoelkopf B (2017) From optimal transport to generative modeling: the VEGAN cookbook. CoRR, arXiv:1705.07642

  51. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: Improving particular object retrieval in large scale image databases. In: 2008 IEEE computer society conference on computer vision and pattern recognition (CVPR 2008), 24-26 June 2008, Anchorage, Alaska, USA. IEEE Computer Society , pp 1–8

  52. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE computer society conference on computer vision and pattern recognition (CVPR 2007), 18-23 June 2007, Minneapolis, Minnesota, USA. IEEE Computer Society, pp 1–8

  53. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings

  54. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR.org, pp 448–456

  55. Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings

Download references

Acknowledgements

Supported by National Key R&D Program of China (No. 2017YFB1402400), National Nature Science Foundation of China (No. 61762025), Guangxi Key Laboratory of Trusted Software (No.kx201701), Guangxi Key Laboratory of Optoelectroric Information Processing (No. GD18202), and Frontier and Application Foundation Research Program of CQ CSTC (No. cstc2017jcyjAX0340).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Feng.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Yy., Feng, Y., Liu, Dj. et al. FRWCAE: joint faster-RCNN and Wasserstein convolutional auto-encoder for instance retrieval. Appl Intell 50, 2208–2221 (2020). https://doi.org/10.1007/s10489-019-01625-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01625-y

Keywords

Navigation