Zero-shot recognition with latent visual attributes learning

Xie, Yurui; He, Xiaohai; Zhang, Jing; Luo, Xiaodong

doi:10.1007/s11042-020-09316-4

Zero-shot recognition with latent visual attributes learning

Published: 24 July 2020

Volume 79, pages 27321–27335, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yurui Xie^1,2,
Xiaohai He¹,
Jing Zhang¹ &
…
Xiaodong Luo¹

300 Accesses
3 Citations
Explore all metrics

Abstract

Zero-shot learning (ZSL) aims to recognize novel object categories by means of transferring knowledge extracted from the seen categories (source domain) to the unseen categories (target domain). Recently, most ZSL methods concentrate on learning a visual-semantic alignment to bridge image features and their semantic representations by relying solely on the human-designed attributes. However, few works study whether the human-designed attributes are discriminative enough for recognition task. To address this problem, we propose a couple semantic dictionaries (CSD) learning approach to exploit the latent visual attributes and align the visual-semantic spaces at the same time. Specifically, the learned visual attributes are elegantly incorporated into the semantic representation of image feature and then consolidate the discriminative visual cues for object recognition. In addition, existing ZSL methods suffer from the domain shift issue due to the source domain and target domain have completely separated label spaces. We further employ the visual-semantic alignment and latent visual attributes jointly from source domain to regularise the learning of target domain, which ensures the expansibility of information transfer across domains. We formulate this as an optimization problem on a unified objective and propose an iterative solver. Extensive experiments on two challenging benchmark datasets demonstrate that our proposed approach outperforms several state-of-the-art ZSL methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning

Attribute self-representation steered by exclusive lasso for zero-shot learning

Article 20 May 2022

Jian-Xun Mi, Zhonghao Zhang, … Li-Fang Zhou

Learning exclusive discriminative semantic information for zero-shot learning

Article 25 September 2022

Jian-Xun Mi, Zhonghao Zhang, … Wei Jia

References

Aharon M, Elad M, Bruckstein A (2006) K-svd: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322
Article Google Scholar
Akata Z, Reed S, Walter D, Schiele B (2015) Evaluation of output embeddings for fine-grained image classification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2927–2936
Akata Z, Perronnin F, Harchaoui Z, Schmid C (2016) Label-embedding for image classification. IEEE Trans Pattern Anal Mach Intell 38(7):1425–1438
Article Google Scholar
Bansal A, Sikka K, Sharma G, Chellappa R, Divakaran A (2018) Zero-shot object detection. In: European conference on computer vision (ECCV), pp 397–414
Biswas S, Annadani Y (2018) Preserving semantic relations for zero-shot learning. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7603–7612
Changpinyo S, Chao W, Gong B, Sha F (2016) Synthesized classifiers for zero-shot learning. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5327–5336
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In: The British machine vision conference (BMVC)
Ding Z, Shao M, Fu Y (2019) Generative zero-shot learning via low-rank embedded semantic dictionary. IEEE Trans Pattern Anal Mach Intell 41 (12):2861–2874
Article Google Scholar
Elhoseiny M, Saleh B, Elgammal A (2013) Write a classifier: zero-shot learning using purely textual descriptions. In: IEEE international conference on computer vision (ICCV), pp 2584–2591
Farhadi A, Endres I, Hoiem D, Forsyth D (2009) Describing objects by their attributes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1778–1785
Frome A, Corrado G S, Shlens J, Bengio S, Dean J, Ranzato A, Mikolov T (2013) Devise: a deep visual-semantic embedding model. In: Advances in neural information processing systems (NIPS), pp 2121–2129
Fu Y, Hospedales T M, Xiang T, Fu Z, Gong S (2014) Transductive multi-view embedding for zero-shot recognition and annotation. In: European conference on computer vision, pp 584–599
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Holger C, Jasper U, Vittorio F (2014) Microsoft coco: common objects in context. In: European conference on computer vision (ECCV), pp 740–755
Hung K -W, Wang K, Jiang J (2019) Image interpolation using convolutional neural networks with deep recursive residual learning. Multimed Tools Appl 78:22813–22831
Article Google Scholar
Jiang H, Wang R, Shan S, Chen X (2018) Learning class prototypes via structure alignment for zero-shot recognition. In: European conference on computer vision (ECCV), pp 121–138
Kodirov E, Xiang T, Gong S (2017) Semantic autoencoder for zero-shot learning. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 4447–4456
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: The conference on neural information processing systems (NIPS)
Lampert C H, Nickisch H, Harmeling S (2014) Attribute-based classification for zero-shot visual object categorization. IEEE Trans Pattern Anal Mach Intell 36(3):453–465
Article Google Scholar
Lee H, Battle A, Raina R, Ng A Y (2007) Efficient sparse coding algorithms. In: The conference on neural information processing systems (NIPS), pp 801–808
Liu W, Yang X, Tao D, Cheng J, Tang Y (2018) Multiview dimension reduction via hessian multiset canonical correlations. Inf Fusion 41:119–128
Article Google Scholar
Mairal J, Bach F, Ponce J, Sapiro G (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Res 11:19–60
MathSciNet MATH Google Scholar
Mettes P, Snoek C G M (2017) Spatial-aware object embeddings for zero-shot localization and classification of actions. In: IEEE international conference on computer vision (ICCV), pp 4453–4462
Norouzi M, Mikolov T, Bengio S, Singer Y, Shlens J, Frome A, Corrado G, Dean J (2014) Zero-shot learning by convex combination of semantic embeddings. In: The international conference on learning representations (ICLR)
Purushwalkam S, Nickel M, Gupta A, Ranzato M ’A (2019) Task-driven modular networks for zero-shot compositional learning. In: The IEEE international conference on computer vision (ICCV)
Qin J, Liu L, Shao L, Shen F, Ni B, Chen J, Wang Y (2017) Zero-shot action recognition with error-correcting output codes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1042–1051
Romera-Paredes B, Torr P H (2015) An embarrassingly simple approach to zero-shot learning. In: International conference on machine learning (ICML), pp 2152–2161
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg A C, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115 (3):211–252
Article MathSciNet Google Scholar
Shen Y, Liu L, Shen F, Shao L (2018) Zero-shot sketch-image hashing. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3598–3607
Shocher A, Cohen N, Irani M (2018) Zero-shot super-resolution using deep internal learning. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3118–3126
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations (ICLR)
Socher R, Ganjoo M, Manning C D, Ng A (2013) Zero-shot learning through cross-modal transfer. In: Advances in neural information processing systems (NIPS), pp 935–943
Szczuko P (2019) Deep neural networks for human pose estimation from a very low resolution depth image. Multimed Tools Appl 78:29357–29377
Article Google Scholar
Tong B, Wang C, Klinkigt M, Kobayashi Y, Nonaka Y (2019) Hierarchical disentanglement of discriminative latent features for zero-shot learning. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Verma V K, Rai P (2017) A simple exponential family framework for zero-shot learning. In: European conference on machine learning and knowledge discovery in databases (ECML), pp 792–808
Wang Y, Zhang H, Zhang Z, et al. (2019) Asymmetric graph based zero shot learning. Multimed Tools Appl. https://doi.org/10.1007/s11042-019-7689-y
Xian Y, Akata Z, Sharma G, Nguyen Q, Hein M, Schiele B (2016) Latent embeddings for zero-shot classification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 69–77
Xian Y, Lampert CH, Schiele B, Akata Z (2019) Zero-shot learning—A comprehensive evaluation of the Good, the Bad and the Ugly. in IEEE Transactions on Pattern Analysis and Machine Intelligence 41(9):2251–2265
Article Google Scholar
Xie G -S, Liu L, Jin X, Zhu F, Zhang Z, Qin J, Yao Y, Shao L (2019) Attentive region embedding network for zero-shot learning. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Xu X, Wu H, Yang Y, Shen F, Xie N, Ji Y (2018) Semantic binary coding for visual recognition via joint concept-attribute modelling. Multimed Tools Appl 77(17):22185–22198
Article Google Scholar
Xu C, Yang J, Gao J (2019) Coupled-learning convolutional neural networks for object recognition. Multimed Tools Appl 78(1):573–589
Article Google Scholar
Yang Y, Zhuang Y, Gan C, Lin M, Hauptmann A G (2015) Exploring semantic inter-class relationships (sir) for zero-shot action recognition. In: AAAI conference on artificial intelligence (AAAI)
Yang X, Weifeng L, Liu W, Tao D (2019) A survey on Canonical Correlation Analysis. in IEEE Transactions on Knowledge and Data Engineering 1–1. https://doi.org/10.1109/TKDE.2019.2958342.
Yelamarthi S K, Reddy S K, Mishra A, Mittal A (2018) A zero-shot framework for sketch based image retrieval. In: European Conference on Computer Vision (ECCV), pp 316–333
Yu J, Tao D, Li J, Cheng J (2014) Semantic preserving distance metric learning and applications. Inf Sci 281:674–686
Article MathSciNet Google Scholar
Yu S, Cheng Y, Su S, Cai G, Li S (2016) Stratified pooling based deep convolutional neural networks for human action recognition. Multimed Tools Appl 76(11):13367–13382
Article Google Scholar
Zhang Z, Saligrama V (2015) Zero-shot learning via semantic similarity embedding. In: IEEE international conference on computer vision (ICCV), pp 4166–4174
Zhang H, Long Y, Shao L (2019) Zero-shot leaning and hashing with binary visual similes. Multimed Tools Appl 78:24147–24165
Article Google Scholar

Download references

Acknowledgements

This work was supported by The National Natural Science Foundation of China (No. 61806028), The Program for Educational Foundation of Sichuan Province, China (No. 18ZB0125), and in part by the Industrial Cluster Collaborative Innovation Project of Chengdu (No. 2016-XT00-00015-GX), the Sichuan Science and Technology Program (No. 2018HH0143).

Author information

Authors and Affiliations

College of Electronics and Information Engineering, Sichuan University, Chengdu, China
Yurui Xie, Xiaohai He, Jing Zhang & Xiaodong Luo
Chengdu University of Information Technology, Chengdu, China
Yurui Xie

Authors

Yurui Xie
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohai He
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaohai He.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xie, Y., He, X., Zhang, J. et al. Zero-shot recognition with latent visual attributes learning. Multimed Tools Appl 79, 27321–27335 (2020). https://doi.org/10.1007/s11042-020-09316-4

Download citation

Received: 19 July 2019
Revised: 23 April 2020
Accepted: 09 July 2020
Published: 24 July 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11042-020-09316-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Zero-shot recognition with latent visual attributes learning

Abstract

Access this article

Similar content being viewed by others

Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning

Attribute self-representation steered by exclusive lasso for zero-shot learning

Learning exclusive discriminative semantic information for zero-shot learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Zero-shot recognition with latent visual attributes learning

Abstract

Access this article

Similar content being viewed by others

Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning

Attribute self-representation steered by exclusive lasso for zero-shot learning

Learning exclusive discriminative semantic information for zero-shot learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation