Transferrable Feature and Projection Learning with Class Hierarchy for Zero-Shot Learning

Li, Aoxue; Lu, Zhiwu; Guan, Jiechao; Xiang, Tao; Wang, Liwei; Wen, Ji-Rong

doi:10.1007/s11263-020-01342-x

Transferrable Feature and Projection Learning with Class Hierarchy for Zero-Shot Learning

Published: 03 June 2020

Volume 128, pages 2810–2827, (2020)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Aoxue Li¹,
Zhiwu Lu ORCID: orcid.org/0000-0003-0280-7724²,
Jiechao Guan²,
Tao Xiang³,
Liwei Wang¹ &
…
Ji-Rong Wen²

976 Accesses
11 Citations
Explore all metrics

Abstract

Zero-shot learning (ZSL) aims to transfer knowledge from seen classes to unseen ones so that the latter can be recognised without any training samples. This is made possible by learning a projection function between a feature space and a semantic space (e.g. attribute space). Considering the seen and unseen classes as two domains, a big domain gap often exists which challenges ZSL. In this work, we propose a novel inductive ZSL model that leverages superclasses as the bridge between seen and unseen classes to narrow the domain gap. Specifically, we first build a class hierarchy of multiple superclass layers and a single class layer, where the superclasses are automatically generated by data-driven clustering over the semantic representations of all seen and unseen class names. We then exploit the superclasses from the class hierarchy to tackle the domain gap challenge in two aspects: deep feature learning and projection function learning. First, to narrow the domain gap in the feature space, we define a recurrent neural network over superclasses and then plug it into a convolutional neural network for enforcing the superclass hierarchy. Second, to further learn a transferrable projection function for ZSL, a novel projection function learning method is proposed by exploiting the superclasses to align the two domains. Importantly, our transferrable feature and projection learning methods can be easily extended to a closely related task—few-shot learning (FSL). Extensive experiments show that the proposed model outperforms the state-of-the-art alternatives in both ZSL and FSL tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stack-VAE Network for Zero-Shot Learning

Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning

Class Representative Learning for Zero-shot Learning Using Purely Visual Data

Article 01 June 2021

Notes

Keeping in mind that each superclass has inherited training image samples only from seen classes, we find that our CNN-RNN model is not trained over some superclasses that only contain unseen classes.
Our CNN-RNN model can still extract transferrable features for these test unseen-class samples because the corresponding unseen classes share higher-level superclasses with some seen classes.
The class hierarchy is available at http://www.image-net.org.

References

Akata, Z., Reed, S., Walter, D., Lee, H., & Schiele, B. (2015). Evaluation of output embeddings for fine-grained image classification. In Proceedings of CVPR, pp. 2927–2936.
Akata, Z., Perronnin, F., Harchaoui, Z., & Schmid, C. (2016). Label-embedding for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(7), 1425–1438.
Article Google Scholar
Bartels, R. H., & Stewart, G. W. (1972). Solution of the matrix equation ax + xb = c [f4]. Communications of the ACM, 15, 820–826.
Article Google Scholar
Bo, L., Ren, X., & Fox, D. (2011). Hierarchical matching pursuit for image classification: Architecture and fast algorithms. In NIPS, pp. 2115–2123.
Bucher, M., Herbin, S., & Jurie, F. (2017). Generating visual representations for zero-shot classification. In ICCV workshops: Transferring and adapting source knowledge in computer vision, pp. 2666–2673.
Changpinyo, S., Chao, W. L., Gong, B., & Sha, F. (2016). Synthesized classifiers for zero-shot learning. In Proceedings of CVPR, pp. 5327–5336.
Chao, W. L., Changpinyo, S., Gong, B., & Sha, F. (2016). An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In Proceedings of ECCV, pp. 52–68.
Chen, L., Zhang, H., Xiao, J., Liu, W., & Chang, S. F. (2018). Zero-shot visual recognition using semantics-preserving adversarial embedding network. In Proceedings of CVPR, pp. 1043–1052.
Deng, J., Ding, N., Jia, Y., Frome, A., Murphy, K., Bengio, S., et al. (2014). Large-scale object classification using label relation graphs. In Proceedings of ECCV, pp. 48–64.
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., et al. (2014). DeCAF: A deep convolutional activation feature for generic visual recognition. In Proceedings of ICML, pp. 647–655.
Fei-Fei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 594–611.
Article Google Scholar
Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of ICML, pp. 1126–1135.
Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Ranzato, M. A., et al. (2013). DeViSE: A deep visual-semantic embedding model. In NIPS, pp. 2121–2129.
Fu, Y., & Sigal, L. (2016). Semi-supervised vocabulary-informed learning. In Proceedings of CVPR, pp. 5337–5346.
Fu, Y., Hospedales, T. M., Xiang, T., & Gong, S. (2015a). Transductive multi-view zero-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(11), 2332–2345.
Article Google Scholar
Fu, Z., Xiang, T., Kodirov, E., & Gong, S. (2015b). Zero-shot object recognition by semantic manifold distance. In Proceedings of CVPR, pp. 2635–2644.
Fu, Z., Xiang, T., Kodirov, E., & Gong, S. (2018). Zero-shot learning on semantic class prototype graph. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(8), 2009–2022.
Article Google Scholar
Graves, A., Mohamed, A., & Hinton, G. E. (2013). Speech recognition with deep recurrent neural networks. In Proceedings of ICASSP, pp. 6645–6649.
Guo, Y., Ding, G., Jin, X., & Wang, J. (2016). Transductive zero-shot recognition via shared model space learning. In Proceedings of AAAI, pp. 3494–3500.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of CVPR, pp. 770–778.
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of CVPR, pp. 2261–2269.
Hwang, S. J., & Sigal, L. (2014). A unified semantic embedding: Relating taxonomies and attributes. In NIPS, pp. 271–279.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of ACMM, pp. 675–678.
Kankuekul, P., Kawewong, A., Tangruamsub, S., & Hasegawa, O. (2012). Online incremental attribute-based zero-shot learning. In Proceedings of CVPR, pp. 3657–3664.
Kodirov, E., Xiang, T., Fu, Z., & Gong, S. (2015). Unsupervised domain adaptation for zero-shot learning. In Proceedings of ICCV, pp. 2452–2460.
Kodirov, E., Xiang, T., & Gong, S. (2017). Semantic autoencoder for zero-shot learning. In Proceedings of CVPR, pp. 3174–3183.
Lake, B. M., Salakhutdinov, R. R., & Tenenbaum, J. (2013). One-shot learning by inverting a compositional causal process. In: NIPS, pp. 2526–2534.
Lampert, C. H., Nickisch, H., & Harmeling, S. (2014). Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3), 453–465.
Article Google Scholar
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551.
Article Google Scholar
Lei Ba, J., Swersky, K., Fidler, S., & Salakhutdinov, R. (2015). Predicting deep zero-shot convolutional neural networks using textual descriptions. In Proceedings of ICCV, pp. 4247–4255.
Li, A., Lu, Z., Wang, L., Xiang, T., & Wen, J. R. (2017). Zero-shot scene classification for high spatial resolution remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 55(7), 4157–4167.
Article Google Scholar
Li, X., Guo, Y., & Schuurmans, D. (2015). Semi-supervised zero-shot classification with label representation learning. In Proceedings of ICCV, pp. 4211–4219.
Liu, F., Xiang, T., Hospedales, T. M., Yang, W., & Sun, C. (2017). Semantic regularisation for recurrent image annotation. In Proceedings of CVPR, pp. 4160–4168.
Long, T., Xu, X., Shen, F., Liu, L., Xie, N., & Yang, Y. (2018). Zero-shot learning via discriminative representation extraction. Pattern Recognition Letters, 109, 27–34.
Article Google Scholar
Lu, Y. (2015). Unsupervised learning on neural network outputs: With application in zero-shot learning. arXiv:1506.00990.
Miller, G. A. (1995). Wordnet: An online lexical database. Communications of the ACM, 38(11), 39–44.
Article Google Scholar
Mishra, A., Reddy, M. S. K., Mittal, A., & Murthy, H. A. (2017). A generative model for zero shot learning using conditional variational autoencoders. arXiv:1709.00663.
Mishra, N., Rohaninejad, M., Chen, X., & Abbeel, P. (2016). A simple neural attentive meta-learner. In Proceedings of ICLR.
Norouzi, M., Mikolov, T., Bengio, S., Singer, Y., Shlens, J., Frome, A., et al. (2014) Zero-shot learning by convex combination of semantic embeddings. In Proceedings of ICLR.
Patterson, G., Xu, C., Su, H., & Hays, J. (2014). The sun attribute database: Beyond categories for deeper scene understanding. International Journal of Computer Vision, 108(1), 59–81.
Article Google Scholar
Qiao, S., Liu, C., Shen, W., & Yuille, A. L. (2018). Few-shot image recognition by predicting parameters from activations. In Proceedings of CVPR, pp. 7229–7238.
Radovanović, M., Nanopoulos, A., & Ivanović, M. (2010). Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research, 11(9), 2487–2531.
MathSciNet MATH Google Scholar
Ravi, S,. & Larochelle, H. (2016). Optimization as a model for few-shot learning. In Proceedings of ICLR.
Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In CVPR, pp. 7263–7271.
Reed, S., Akata, Z., Lee, H., & Schiele, B. (2016). Learning deep representations of fine-grained visual descriptions. In Proceedings of CVPR, pp. 49–58.
Rohrbach, M., Ebert, S., & Schiele, B. (2013). Transfer learning in a transductive setting. In NIPS, pp. 46–54.
Romera-Paredes, B., & Torr, P. H. S. (2015). An embarrassingly simple approach to zero-shot learning. In Proceedings of ICML, pp. 2152–2161.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Article MathSciNet Google Scholar
Shigeto, Y., Suzuki, I., Hara, K., Shimbo, M., & Matsumoto, Y. (2015). Ridge regression, hubness, and zero-shot learning. In Proceedings of European conference on machine learning and principles and practice of knowledge discovery in databases, pp. 135–151.
Shojaee, S. M., & Baghshah, M. S. (2016). Semi-supervised zero-shot learning by a clustering-based approach. arXiv:1605.09016.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. In ICLR.
Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. In NIPS, pp. 4080–4090.
Socher, R., Ganjoo, M., Manning, C. D., & Ng, A. (2013). Zero-shot learning through cross-modal transfer. In NIPS, pp. 935–943.
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P. H., & Hospedales, T. M. (2018). Learning to compare: Relation network for few-shot learning. In Proceedings of CVPR, pp. 1199–1208.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceedings of CVPR, pp. 1–9.
Triantafillou, E., Zemel, R., & Urtasun, R. (2017). Few-shot learning through an information retrieval lens. In NIPS, pp. 2255–2265.
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., & Wierstra, D. (2016). Matching networks for one shot learning. In NIPS, pp. 3630–3638.
Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset. Technical Report of CNS-TR-2011-001. California Institute of Technology.
Wang, F., & Zhang, C. (2008). Label propagation through linear neighborhoods. IEEE Transactions on Knowledge and Data Engineer, 20(1), 55–67.
Article Google Scholar
Wang, Q., & Chen, K. (2017). Zero-shot visual recognition via bidirectional latent embedding. International Journal of Computer Vision, 124(3), 356–383.
Article MathSciNet Google Scholar
Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., & Schiele, B. (2016). Latent embeddings for zero-shot classification. In Proceedings of CVPR, pp. 69–77.
Xian, Y., Schiele, B., & Akata, Z. (2017). Zero-shot learning—The good, the bad and the ugly. In Proceedings of CVPR, pp. 4582–4591.
Xian, Y., Lorenz, T., Schiele, B., & Akata, Z. (2018). Feature generating networks for zero-shot learning. In Proceedings of CVPR, pp. 5542–5551.
Xu, X., Hospedales, T., & Gong, S. (2015). Semantic embedding space for zero-shot action recognition. In Proceedings of IEEE conference on image processing, pp. 63–67.
Xu, X., Hospedales, T., & Gong, S. (2017). Transductive zero-shot action recognition by word-vector embedding. International Journal of Computer Vision, 123(3), 309–333.
Article MathSciNet Google Scholar
Ye, M., & Guo, Y. (2017). Zero-shot classification with discriminative semantic representation learning. In Proceedings of CVPR, pp. 7140–7148.
Yu, Y., Ji, Z., Li, X., Guo, J., Zhang, Z., Ling, H., et al. (2017). Transductive zero-shot learning with a self-training dictionary approach. arXiv:1703.08893
Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. In Proceedings of British machine vision conference, pp. 87.1–87.12.
Zhang, L., Xiang, T., & Gong, S. (2017). Learning a deep embedding model for zero-shot learning. In Proceedings of CVPR, pp. 2021–2030.
Zhang, Z., & Saligrama, V. (2015). Zero-shot learning via semantic similarity embedding. In Proceedings of ICCV, pp. 4166–4174.
Zhang, Z., & Saligrama, V. (2016a). Zero-shot learning via joint latent similarity embedding. In Proceedings of CVPR, pp. 6034–6042.
Zhang, Z., & Saligrama, V. (2016b). Zero-shot recognition via structured prediction. In Proceedings of ECCV, pp. 533–548.
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., et al. (2015). Conditional random fields as recurrent neural networks. Proceedings of CVPR, pp. 1529–1537.
Zhu, Y., Elhoseiny, M., Liu, B., & Elgammal, A. M. (2018). A generative adversarial approach for zero-shot learning from noisy texts. In Proceedings of CVPR, pp. 1004–1013.

Download references

Acknowledgements

This work was supported in part by National Key R&D Program of China (2018YFB1402600), National Natural Science Foundation of China (61976220, 61573026, 61832017), and Beijing Natural Science Foundation (L172037).

Author information

Authors and Affiliations

The Key Laboratory of Machine Perception (MOE), School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China
Aoxue Li & Liwei Wang
The Beijing Key Laboratory of Big Data Management and Analysis Methods, Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, 100872, China
Zhiwu Lu, Jiechao Guan & Ji-Rong Wen
The Department of Electrical and Electronic Engineering, University of Surrey, Guildford, Surrey, GU2 7XH, UK
Tao Xiang

Authors

Aoxue Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwu Lu
View author publications
You can also search for this author in PubMed Google Scholar
Jiechao Guan
View author publications
You can also search for this author in PubMed Google Scholar
Tao Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Liwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ji-Rong Wen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiwu Lu.

Additional information

Communicated by Cristian Sminchisescu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, A., Lu, Z., Guan, J. et al. Transferrable Feature and Projection Learning with Class Hierarchy for Zero-Shot Learning. Int J Comput Vis 128, 2810–2827 (2020). https://doi.org/10.1007/s11263-020-01342-x

Download citation

Received: 12 October 2018
Accepted: 12 May 2020
Published: 03 June 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11263-020-01342-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transferrable Feature and Projection Learning with Class Hierarchy for Zero-Shot Learning

Abstract

Access this article

Similar content being viewed by others

Stack-VAE Network for Zero-Shot Learning

Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning

Class Representative Learning for Zero-shot Learning Using Purely Visual Data

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Transferrable Feature and Projection Learning with Class Hierarchy for Zero-Shot Learning

Abstract

Access this article

Similar content being viewed by others

Stack-VAE Network for Zero-Shot Learning

Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning

Class Representative Learning for Zero-shot Learning Using Purely Visual Data

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation