Skip to main content
Log in

Zero-shot recognition with latent visual attributes learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Zero-shot learning (ZSL) aims to recognize novel object categories by means of transferring knowledge extracted from the seen categories (source domain) to the unseen categories (target domain). Recently, most ZSL methods concentrate on learning a visual-semantic alignment to bridge image features and their semantic representations by relying solely on the human-designed attributes. However, few works study whether the human-designed attributes are discriminative enough for recognition task. To address this problem, we propose a couple semantic dictionaries (CSD) learning approach to exploit the latent visual attributes and align the visual-semantic spaces at the same time. Specifically, the learned visual attributes are elegantly incorporated into the semantic representation of image feature and then consolidate the discriminative visual cues for object recognition. In addition, existing ZSL methods suffer from the domain shift issue due to the source domain and target domain have completely separated label spaces. We further employ the visual-semantic alignment and latent visual attributes jointly from source domain to regularise the learning of target domain, which ensures the expansibility of information transfer across domains. We formulate this as an optimization problem on a unified objective and propose an iterative solver. Extensive experiments on two challenging benchmark datasets demonstrate that our proposed approach outperforms several state-of-the-art ZSL methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Aharon M, Elad M, Bruckstein A (2006) K-svd: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322

    Article  Google Scholar 

  2. Akata Z, Reed S, Walter D, Schiele B (2015) Evaluation of output embeddings for fine-grained image classification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2927–2936

  3. Akata Z, Perronnin F, Harchaoui Z, Schmid C (2016) Label-embedding for image classification. IEEE Trans Pattern Anal Mach Intell 38(7):1425–1438

    Article  Google Scholar 

  4. Bansal A, Sikka K, Sharma G, Chellappa R, Divakaran A (2018) Zero-shot object detection. In: European conference on computer vision (ECCV), pp 397–414

  5. Biswas S, Annadani Y (2018) Preserving semantic relations for zero-shot learning. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7603–7612

  6. Changpinyo S, Chao W, Gong B, Sha F (2016) Synthesized classifiers for zero-shot learning. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5327–5336

  7. Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In: The British machine vision conference (BMVC)

  8. Ding Z, Shao M, Fu Y (2019) Generative zero-shot learning via low-rank embedded semantic dictionary. IEEE Trans Pattern Anal Mach Intell 41 (12):2861–2874

    Article  Google Scholar 

  9. Elhoseiny M, Saleh B, Elgammal A (2013) Write a classifier: zero-shot learning using purely textual descriptions. In: IEEE international conference on computer vision (ICCV), pp 2584–2591

  10. Farhadi A, Endres I, Hoiem D, Forsyth D (2009) Describing objects by their attributes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1778–1785

  11. Frome A, Corrado G S, Shlens J, Bengio S, Dean J, Ranzato A, Mikolov T (2013) Devise: a deep visual-semantic embedding model. In: Advances in neural information processing systems (NIPS), pp 2121–2129

  12. Fu Y, Hospedales T M, Xiang T, Fu Z, Gong S (2014) Transductive multi-view embedding for zero-shot recognition and annotation. In: European conference on computer vision, pp 584–599

  13. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778

  14. Holger C, Jasper U, Vittorio F (2014) Microsoft coco: common objects in context. In: European conference on computer vision (ECCV), pp 740–755

  15. Hung K -W, Wang K, Jiang J (2019) Image interpolation using convolutional neural networks with deep recursive residual learning. Multimed Tools Appl 78:22813–22831

    Article  Google Scholar 

  16. Jiang H, Wang R, Shan S, Chen X (2018) Learning class prototypes via structure alignment for zero-shot recognition. In: European conference on computer vision (ECCV), pp 121–138

  17. Kodirov E, Xiang T, Gong S (2017) Semantic autoencoder for zero-shot learning. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 4447–4456

  18. Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: The conference on neural information processing systems (NIPS)

  19. Lampert C H, Nickisch H, Harmeling S (2014) Attribute-based classification for zero-shot visual object categorization. IEEE Trans Pattern Anal Mach Intell 36(3):453–465

    Article  Google Scholar 

  20. Lee H, Battle A, Raina R, Ng A Y (2007) Efficient sparse coding algorithms. In: The conference on neural information processing systems (NIPS), pp 801–808

  21. Liu W, Yang X, Tao D, Cheng J, Tang Y (2018) Multiview dimension reduction via hessian multiset canonical correlations. Inf Fusion 41:119–128

    Article  Google Scholar 

  22. Mairal J, Bach F, Ponce J, Sapiro G (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Res 11:19–60

    MathSciNet  MATH  Google Scholar 

  23. Mettes P, Snoek C G M (2017) Spatial-aware object embeddings for zero-shot localization and classification of actions. In: IEEE international conference on computer vision (ICCV), pp 4453–4462

  24. Norouzi M, Mikolov T, Bengio S, Singer Y, Shlens J, Frome A, Corrado G, Dean J (2014) Zero-shot learning by convex combination of semantic embeddings. In: The international conference on learning representations (ICLR)

  25. Purushwalkam S, Nickel M, Gupta A, Ranzato M ’A (2019) Task-driven modular networks for zero-shot compositional learning. In: The IEEE international conference on computer vision (ICCV)

  26. Qin J, Liu L, Shao L, Shen F, Ni B, Chen J, Wang Y (2017) Zero-shot action recognition with error-correcting output codes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1042–1051

  27. Romera-Paredes B, Torr P H (2015) An embarrassingly simple approach to zero-shot learning. In: International conference on machine learning (ICML), pp 2152–2161

  28. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg A C, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115 (3):211–252

    Article  MathSciNet  Google Scholar 

  29. Shen Y, Liu L, Shen F, Shao L (2018) Zero-shot sketch-image hashing. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3598–3607

  30. Shocher A, Cohen N, Irani M (2018) Zero-shot super-resolution using deep internal learning. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3118–3126

  31. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations (ICLR)

  32. Socher R, Ganjoo M, Manning C D, Ng A (2013) Zero-shot learning through cross-modal transfer. In: Advances in neural information processing systems (NIPS), pp 935–943

  33. Szczuko P (2019) Deep neural networks for human pose estimation from a very low resolution depth image. Multimed Tools Appl 78:29357–29377

    Article  Google Scholar 

  34. Tong B, Wang C, Klinkigt M, Kobayashi Y, Nonaka Y (2019) Hierarchical disentanglement of discriminative latent features for zero-shot learning. In: The IEEE conference on computer vision and pattern recognition (CVPR)

  35. Verma V K, Rai P (2017) A simple exponential family framework for zero-shot learning. In: European conference on machine learning and knowledge discovery in databases (ECML), pp 792–808

  36. Wang Y, Zhang H, Zhang Z, et al. (2019) Asymmetric graph based zero shot learning. Multimed Tools Appl. https://doi.org/10.1007/s11042-019-7689-y

  37. Xian Y, Akata Z, Sharma G, Nguyen Q, Hein M, Schiele B (2016) Latent embeddings for zero-shot classification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 69–77

  38. Xian Y, Lampert CH, Schiele B, Akata Z (2019) Zero-shot learning—A comprehensive evaluation of the Good, the Bad and the Ugly. in IEEE Transactions on Pattern Analysis and Machine Intelligence 41(9):2251–2265

    Article  Google Scholar 

  39. Xie G -S, Liu L, Jin X, Zhu F, Zhang Z, Qin J, Yao Y, Shao L (2019) Attentive region embedding network for zero-shot learning. In: The IEEE conference on computer vision and pattern recognition (CVPR)

  40. Xu X, Wu H, Yang Y, Shen F, Xie N, Ji Y (2018) Semantic binary coding for visual recognition via joint concept-attribute modelling. Multimed Tools Appl 77(17):22185–22198

    Article  Google Scholar 

  41. Xu C, Yang J, Gao J (2019) Coupled-learning convolutional neural networks for object recognition. Multimed Tools Appl 78(1):573–589

    Article  Google Scholar 

  42. Yang Y, Zhuang Y, Gan C, Lin M, Hauptmann A G (2015) Exploring semantic inter-class relationships (sir) for zero-shot action recognition. In: AAAI conference on artificial intelligence (AAAI)

  43. Yang X, Weifeng L, Liu W, Tao D (2019) A survey on Canonical Correlation Analysis. in IEEE Transactions on Knowledge and Data Engineering 1–1. https://doi.org/10.1109/TKDE.2019.2958342.

  44. Yelamarthi S K, Reddy S K, Mishra A, Mittal A (2018) A zero-shot framework for sketch based image retrieval. In: European Conference on Computer Vision (ECCV), pp 316–333

  45. Yu J, Tao D, Li J, Cheng J (2014) Semantic preserving distance metric learning and applications. Inf Sci 281:674–686

    Article  MathSciNet  Google Scholar 

  46. Yu S, Cheng Y, Su S, Cai G, Li S (2016) Stratified pooling based deep convolutional neural networks for human action recognition. Multimed Tools Appl 76(11):13367–13382

    Article  Google Scholar 

  47. Zhang Z, Saligrama V (2015) Zero-shot learning via semantic similarity embedding. In: IEEE international conference on computer vision (ICCV), pp 4166–4174

  48. Zhang H, Long Y, Shao L (2019) Zero-shot leaning and hashing with binary visual similes. Multimed Tools Appl 78:24147–24165

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by The National Natural Science Foundation of China (No. 61806028), The Program for Educational Foundation of Sichuan Province, China (No. 18ZB0125), and in part by the Industrial Cluster Collaborative Innovation Project of Chengdu (No. 2016-XT00-00015-GX), the Sichuan Science and Technology Program (No. 2018HH0143).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaohai He.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, Y., He, X., Zhang, J. et al. Zero-shot recognition with latent visual attributes learning. Multimed Tools Appl 79, 27321–27335 (2020). https://doi.org/10.1007/s11042-020-09316-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09316-4

Keywords

Navigation