Abstract
Based on the powerful feature extraction capability of deep convolutional neural networks, image-level retrieval methods have achieved superior performance compared to the hand-crafted features and indexing algorithms. However, people tend to focus on foreground objects of interest in images. Locating objects accurately and using object-level features for retrieval become the essential tasks of instance search. In this work, we propose a novel instance retrieval method FRWACE, which combines the Faster R-CNN framework for object-level feature extraction with a brand-new Wasserstein Convolutional Auto-encoder for dimensionality reduction. In addition, we propose a considerate category-first spatial re-rank strategy to improve instance-level retrieval accuracy. Extensive experiments on four large datasets Oxford 5K, Paris 6K, Oxford 105K and Paris 106K show that our approach has achieved significant performance compared to the state-of-the-arts.
Similar content being viewed by others
References
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Sivic, Zisserman (2003) Video google: a text retrieval approach to object matching in videos. In: 9th IEEE international conference on computer vision (ICCV 2003), 14-17 October 2003, Nice, France, vol 2. IEEE, pp 1470–1477
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE computer society conference on computer vision and pattern recognition (CVPR 2007), 18-23 June 2007, Minneapolis, Minnesota, USA. IEEE, pp 1–8
Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Computer vision - ECCV 2008, 10th European conference on computer vision, Marseille, France, October 12-18, 2008, Proceedings, Part I, vol 5302. Springer, pp 304–317
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE, pp 770–778
Ren S, He K, Girshick RB, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Wang Z, Wu X (2016) Salient object detection using biogeography-based optimization to combine features. Appl Intell 45(1):1–17
Han X, Xiong X, Duan F (2015) A new method for image segmentation based on BP neural network and gravitational search algorithm enhanced by cat chaotic mapping. Appl Intell 43(4):855–873
Arya R, Singh N, Agrawal RK (2017) A novel combination of second-order statistical features and segmentation using multi-layer superpixels for salient object detection. Appl Intell 46(2):254–271
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE conference on computer vision and pattern recognition, CVPR Workshops 2014, Columbus, OH, USA, June 23-28, 2014. IEEE, pp 512–519
Babenko A, Slesarev A, Chigorin A, Lempitsky VS (2014) Neural codes for image retrieval. In: Computer vision - ECCV 2014 - 13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I, vol 8689. Springer, pp 584–599
Gordo A, Almazan J, Revaud J, Larlus D (2016) Deep image retrieval: Learning global representations for image search. In: Computer vision - ECCV 2016 - 14th European conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI, vol 9910, pp 241–257
Babenko A, Lempitsky VS (2015) Aggregating local deep convolutional features for image retrieval. In: 2015 IEEE international conference on computer vision (ICCV). IEEE, pp 1269–1277
Kalantidis Y, Mellina C, Osindero S (2016) Cross-dimensional weighting for aggregated deep convolutional features. In: Computer vision - ECCV 2016 workshops - Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part I, vol 9913, pp 685–701
Tolias G, Sicre R, Jégou H (2016) Particular object retrieval with integral max-pooling of CNN activations. In: 4th international conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings
Zagoruyko S, Komodakis N (2015) Learning to compare image patches via convolutional neural networks. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. IEEE, pp 4353–4361
Yingying Z, Jiong W, Lingxi X, Liang Z (2018) Attention-based pyramid aggregation network for visual place recognition. In: 2018 ACM multimedia conference on multimedia conference, MM 2018, Seoul, Republic of Korea, October 22-26, 2018. ACM, pp 99–107
Chen Z, Lin J, Chandrasekhar V, Duan L-Y (2018) Gated square-root pooling for image instance retrieval. In: 2018 IEEE International Conference on Image Processing, ICIP 2018, Athens, Greece, October 7-10, 2018, pp 1982–1986
Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 20th ACM symposium on computational geometry, Brooklyn, New York, USA, June 8-11, 2004. ACM Press, pp 253–262
Norouzi M, Blei DM (2011) Minimal loss hashing for compact binary codes. In: Proceedings of the 28th international conference on machine learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011. Omnipress, pp 353–360
Liu W, Wang J, Ji R, Jiang Y-G, Chang S-F (2012) Supervised hashing with kernels. In: 2012 IEEE conference on computer vision and pattern recognition, Providence, RI, USA, June 16-21, 2012. IEEE Computer Society, pp 2074–2081
Jégou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128
Zhang T, Du C, Wang J (2014) Composite quantization for approximate nearest neighbor search. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, 21-26 June 2014. JMLR.org, pp 838–846
Pearson K (1901) Liii. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2(11):559–572
Hotelling H (1932) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417–441
van der Maaten L, Hinton GE (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605
Hinton GE, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Krizhevsky A, Hinton GE (2011) Using very deep autoencoders for content-based image retrieval. In: ESANN 2011, 19th European symposium on artificial neural networks, Bruges, Belgium, April 27-29, 2011, Proceedings
Wang Y, Yao H, Zhao S (2016) Auto-encoder based dimensionality reduction. Neurocomputing 184:232–242
Li E, Du P, Samat A, Meng Y, Che M (2017) Mid-level feature representation via sparse autoencoder for remotely sensed scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 10(3):1068–1081
Luo W, Li J, Yang J, Xu W, Zhang J (2017) Convolutional sparse autoencoders for image classification. IEEE Trans Neural Netw Learning Sys 29(7):3289–3294
Ribeiro M, Lazzaretti AE, Lopes HS (2018) A study of deep convolutional auto-encoders for anomaly detection in videos. Pattern Recogn Lett 105:13–22
Zhou C, Paffenroth RC (2017) Anomaly detection with robust deep autoencoders. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, Halifax, NS, Canada, August 13 - 17, 2017. ACM Press, pp 665–674
Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: 2nd international conference on learning representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings
Mescheder LM, Nowozin S, Geiger A (2017) Adversarial variational bayes: unifying variational autoencoders and generative adversarial networks. In: Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, vol 70. PMLR, pp 2391–2400
Masci J, Meier U, Ciresan DC, Schmidhuber J (2011) Stacked convolutional auto-encoders for hierarchical feature extraction. In: Artificial neural networks and machine learning - ICANN 2011 - 21st International Conference on Artificial Neural Networks, Espoo, Finland, vol 6791. Springer, pp 52–59
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein gan. CoRR, arXiv:1701.07875
Villani C (2003) Topics in optimal transportation. Ams Graduate Studies in Mathematics: 370
Frogner C, Zhang C, Mobahi H, Araya-Polo M, Poggio TA (2015) Learning with a wasserstein loss. In: Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pp 2053–2061
Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: Computer vision - ECCV 2014 - 13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VII, pp 392–407
Yandex AB, Lempitsky V (2015) Aggregating local deep features for image retrieval. In: 2015 IEEE international conference on computer vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pp 1269–1277
Van Der Maaten L, Postma E, Van den Herik J (2007) Dimensionality reduction: a comparative review. J Mach Learn Res - JMLR 10(66-71):13
Jégou H, Chum O (2012) Negative evidences and co-occurences in image retrieval: the benefit of PCA and whitening. In: Computer vision - ECCV 2012 - 12th European conference on computer vision, Florence, Italy, October 7-13, 2012, Proceedings, Part II. Springer, Berlin, pp 774–787
Belarbi MA, Mahmoudi S, Belalem G (2017) PCA as dimensionality reduction for large-scale image retrieval systems. Int J Ambient Comput Intell (IJACI) 8(4):45–58
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Ghodrati H, Hamza AB (2017) Nonrigid 3d shape retrieval using deep auto-encoders. Appl Intell 47 (1):44–61
Pulgar FJ, Charte F, Rivera AJ, del Jesús MJ (2018) Aeknn: An autoencoder knn-based classifier with built-in dimensionality reduction. Int J Comput Intell Sys 12(1):436–452
Tolstikhin I, Bousquet O, Gelly S, Schoelkopf B (2017) Wasserstein auto-encoders. In: 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada. OpenReview.net
Salvador A, Giro-I-Nieto X, Marques F, Satoh S (2016) Faster r-CNN features for instance search. In: 2016 IEEE conference on computer vision and pattern recognition workshops, CVPR workshops 2016, Las Vegas, NV, USA, June 26 - July 1, 2016. IEEE Computer Society, pp 394–401
Bousquet O, Gelly S, Tolstikhin I, Simon-Gabriel C-J, Schoelkopf B (2017) From optimal transport to generative modeling: the VEGAN cookbook. CoRR, arXiv:1705.07642
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: Improving particular object retrieval in large scale image databases. In: 2008 IEEE computer society conference on computer vision and pattern recognition (CVPR 2008), 24-26 June 2008, Anchorage, Alaska, USA. IEEE Computer Society , pp 1–8
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE computer society conference on computer vision and pattern recognition (CVPR 2007), 18-23 June 2007, Minneapolis, Minnesota, USA. IEEE Computer Society, pp 1–8
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR.org, pp 448–456
Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
Acknowledgements
Supported by National Key R&D Program of China (No. 2017YFB1402400), National Nature Science Foundation of China (No. 61762025), Guangxi Key Laboratory of Trusted Software (No.kx201701), Guangxi Key Laboratory of Optoelectroric Information Processing (No. GD18202), and Frontier and Application Foundation Research Program of CQ CSTC (No. cstc2017jcyjAX0340).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, Yy., Feng, Y., Liu, Dj. et al. FRWCAE: joint faster-RCNN and Wasserstein convolutional auto-encoder for instance retrieval. Appl Intell 50, 2208–2221 (2020). https://doi.org/10.1007/s10489-019-01625-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-019-01625-y