Skip to main content
Log in

Transferrable Feature and Projection Learning with Class Hierarchy for Zero-Shot Learning

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Zero-shot learning (ZSL) aims to transfer knowledge from seen classes to unseen ones so that the latter can be recognised without any training samples. This is made possible by learning a projection function between a feature space and a semantic space (e.g. attribute space). Considering the seen and unseen classes as two domains, a big domain gap often exists which challenges ZSL. In this work, we propose a novel inductive ZSL model that leverages superclasses as the bridge between seen and unseen classes to narrow the domain gap. Specifically, we first build a class hierarchy of multiple superclass layers and a single class layer, where the superclasses are automatically generated by data-driven clustering over the semantic representations of all seen and unseen class names. We then exploit the superclasses from the class hierarchy to tackle the domain gap challenge in two aspects: deep feature learning and projection function learning. First, to narrow the domain gap in the feature space, we define a recurrent neural network over superclasses and then plug it into a convolutional neural network for enforcing the superclass hierarchy. Second, to further learn a transferrable projection function for ZSL, a novel projection function learning method is proposed by exploiting the superclasses to align the two domains. Importantly, our transferrable feature and projection learning methods can be easily extended to a closely related task—few-shot learning (FSL). Extensive experiments show that the proposed model outperforms the state-of-the-art alternatives in both ZSL and FSL tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Keeping in mind that each superclass has inherited training image samples only from seen classes, we find that our CNN-RNN model is not trained over some superclasses that only contain unseen classes.

  2. Our CNN-RNN model can still extract transferrable features for these test unseen-class samples because the corresponding unseen classes share higher-level superclasses with some seen classes.

  3. The class hierarchy is available at http://www.image-net.org.

References

  • Akata, Z., Reed, S., Walter, D., Lee, H., & Schiele, B. (2015). Evaluation of output embeddings for fine-grained image classification. In Proceedings of CVPR, pp. 2927–2936.

  • Akata, Z., Perronnin, F., Harchaoui, Z., & Schmid, C. (2016). Label-embedding for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(7), 1425–1438.

    Article  Google Scholar 

  • Bartels, R. H., & Stewart, G. W. (1972). Solution of the matrix equation ax + xb = c [f4]. Communications of the ACM, 15, 820–826.

    Article  Google Scholar 

  • Bo, L., Ren, X., & Fox, D. (2011). Hierarchical matching pursuit for image classification: Architecture and fast algorithms. In NIPS, pp. 2115–2123.

  • Bucher, M., Herbin, S., & Jurie, F. (2017). Generating visual representations for zero-shot classification. In ICCV workshops: Transferring and adapting source knowledge in computer vision, pp. 2666–2673.

  • Changpinyo, S., Chao, W. L., Gong, B., & Sha, F. (2016). Synthesized classifiers for zero-shot learning. In Proceedings of CVPR, pp. 5327–5336.

  • Chao, W. L., Changpinyo, S., Gong, B., & Sha, F. (2016). An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In Proceedings of ECCV, pp. 52–68.

  • Chen, L., Zhang, H., Xiao, J., Liu, W., & Chang, S. F. (2018). Zero-shot visual recognition using semantics-preserving adversarial embedding network. In Proceedings of CVPR, pp. 1043–1052.

  • Deng, J., Ding, N., Jia, Y., Frome, A., Murphy, K., Bengio, S., et al. (2014). Large-scale object classification using label relation graphs. In Proceedings of ECCV, pp. 48–64.

  • Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., et al. (2014). DeCAF: A deep convolutional activation feature for generic visual recognition. In Proceedings of ICML, pp. 647–655.

  • Fei-Fei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 594–611.

    Article  Google Scholar 

  • Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of ICML, pp. 1126–1135.

  • Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Ranzato, M. A., et al. (2013). DeViSE: A deep visual-semantic embedding model. In NIPS, pp. 2121–2129.

  • Fu, Y., & Sigal, L. (2016). Semi-supervised vocabulary-informed learning. In Proceedings of CVPR, pp. 5337–5346.

  • Fu, Y., Hospedales, T. M., Xiang, T., & Gong, S. (2015a). Transductive multi-view zero-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(11), 2332–2345.

    Article  Google Scholar 

  • Fu, Z., Xiang, T., Kodirov, E., & Gong, S. (2015b). Zero-shot object recognition by semantic manifold distance. In Proceedings of CVPR, pp. 2635–2644.

  • Fu, Z., Xiang, T., Kodirov, E., & Gong, S. (2018). Zero-shot learning on semantic class prototype graph. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(8), 2009–2022.

    Article  Google Scholar 

  • Graves, A., Mohamed, A., & Hinton, G. E. (2013). Speech recognition with deep recurrent neural networks. In Proceedings of ICASSP, pp. 6645–6649.

  • Guo, Y., Ding, G., Jin, X., & Wang, J. (2016). Transductive zero-shot recognition via shared model space learning. In Proceedings of AAAI, pp. 3494–3500.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of CVPR, pp. 770–778.

  • Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of CVPR, pp. 2261–2269.

  • Hwang, S. J., & Sigal, L. (2014). A unified semantic embedding: Relating taxonomies and attributes. In NIPS, pp. 271–279.

  • Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of ACMM, pp. 675–678.

  • Kankuekul, P., Kawewong, A., Tangruamsub, S., & Hasegawa, O. (2012). Online incremental attribute-based zero-shot learning. In Proceedings of CVPR, pp. 3657–3664.

  • Kodirov, E., Xiang, T., Fu, Z., & Gong, S. (2015). Unsupervised domain adaptation for zero-shot learning. In Proceedings of ICCV, pp. 2452–2460.

  • Kodirov, E., Xiang, T., & Gong, S. (2017). Semantic autoencoder for zero-shot learning. In Proceedings of CVPR, pp. 3174–3183.

  • Lake, B. M., Salakhutdinov, R. R., & Tenenbaum, J. (2013). One-shot learning by inverting a compositional causal process. In: NIPS, pp. 2526–2534.

  • Lampert, C. H., Nickisch, H., & Harmeling, S. (2014). Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3), 453–465.

    Article  Google Scholar 

  • LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551.

    Article  Google Scholar 

  • Lei Ba, J., Swersky, K., Fidler, S., & Salakhutdinov, R. (2015). Predicting deep zero-shot convolutional neural networks using textual descriptions. In Proceedings of ICCV, pp. 4247–4255.

  • Li, A., Lu, Z., Wang, L., Xiang, T., & Wen, J. R. (2017). Zero-shot scene classification for high spatial resolution remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 55(7), 4157–4167.

    Article  Google Scholar 

  • Li, X., Guo, Y., & Schuurmans, D. (2015). Semi-supervised zero-shot classification with label representation learning. In Proceedings of ICCV, pp. 4211–4219.

  • Liu, F., Xiang, T., Hospedales, T. M., Yang, W., & Sun, C. (2017). Semantic regularisation for recurrent image annotation. In Proceedings of CVPR, pp. 4160–4168.

  • Long, T., Xu, X., Shen, F., Liu, L., Xie, N., & Yang, Y. (2018). Zero-shot learning via discriminative representation extraction. Pattern Recognition Letters, 109, 27–34.

    Article  Google Scholar 

  • Lu, Y. (2015). Unsupervised learning on neural network outputs: With application in zero-shot learning. arXiv:1506.00990.

  • Miller, G. A. (1995). Wordnet: An online lexical database. Communications of the ACM, 38(11), 39–44.

    Article  Google Scholar 

  • Mishra, A., Reddy, M. S. K., Mittal, A., & Murthy, H. A. (2017). A generative model for zero shot learning using conditional variational autoencoders. arXiv:1709.00663.

  • Mishra, N., Rohaninejad, M., Chen, X., & Abbeel, P. (2016). A simple neural attentive meta-learner. In Proceedings of ICLR.

  • Norouzi, M., Mikolov, T., Bengio, S., Singer, Y., Shlens, J., Frome, A., et al. (2014) Zero-shot learning by convex combination of semantic embeddings. In Proceedings of ICLR.

  • Patterson, G., Xu, C., Su, H., & Hays, J. (2014). The sun attribute database: Beyond categories for deeper scene understanding. International Journal of Computer Vision, 108(1), 59–81.

    Article  Google Scholar 

  • Qiao, S., Liu, C., Shen, W., & Yuille, A. L. (2018). Few-shot image recognition by predicting parameters from activations. In Proceedings of CVPR, pp. 7229–7238.

  • Radovanović, M., Nanopoulos, A., & Ivanović, M. (2010). Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research, 11(9), 2487–2531.

    MathSciNet  MATH  Google Scholar 

  • Ravi, S,. & Larochelle, H. (2016). Optimization as a model for few-shot learning. In Proceedings of ICLR.

  • Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In CVPR, pp. 7263–7271.

  • Reed, S., Akata, Z., Lee, H., & Schiele, B. (2016). Learning deep representations of fine-grained visual descriptions. In Proceedings of CVPR, pp. 49–58.

  • Rohrbach, M., Ebert, S., & Schiele, B. (2013). Transfer learning in a transductive setting. In NIPS, pp. 46–54.

  • Romera-Paredes, B., & Torr, P. H. S. (2015). An embarrassingly simple approach to zero-shot learning. In Proceedings of ICML, pp. 2152–2161.

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.

    Article  MathSciNet  Google Scholar 

  • Shigeto, Y., Suzuki, I., Hara, K., Shimbo, M., & Matsumoto, Y. (2015). Ridge regression, hubness, and zero-shot learning. In Proceedings of European conference on machine learning and principles and practice of knowledge discovery in databases, pp. 135–151.

  • Shojaee, S. M., & Baghshah, M. S. (2016). Semi-supervised zero-shot learning by a clustering-based approach. arXiv:1605.09016.

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. In ICLR.

  • Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. In NIPS, pp. 4080–4090.

  • Socher, R., Ganjoo, M., Manning, C. D., & Ng, A. (2013). Zero-shot learning through cross-modal transfer. In NIPS, pp. 935–943.

  • Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P. H., & Hospedales, T. M. (2018). Learning to compare: Relation network for few-shot learning. In Proceedings of CVPR, pp. 1199–1208.

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceedings of CVPR, pp. 1–9.

  • Triantafillou, E., Zemel, R., & Urtasun, R. (2017). Few-shot learning through an information retrieval lens. In NIPS, pp. 2255–2265.

  • Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., & Wierstra, D. (2016). Matching networks for one shot learning. In NIPS, pp. 3630–3638.

  • Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset. Technical Report of CNS-TR-2011-001. California Institute of Technology.

  • Wang, F., & Zhang, C. (2008). Label propagation through linear neighborhoods. IEEE Transactions on Knowledge and Data Engineer, 20(1), 55–67.

    Article  Google Scholar 

  • Wang, Q., & Chen, K. (2017). Zero-shot visual recognition via bidirectional latent embedding. International Journal of Computer Vision, 124(3), 356–383.

    Article  MathSciNet  Google Scholar 

  • Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., & Schiele, B. (2016). Latent embeddings for zero-shot classification. In Proceedings of CVPR, pp. 69–77.

  • Xian, Y., Schiele, B., & Akata, Z. (2017). Zero-shot learning—The good, the bad and the ugly. In Proceedings of CVPR, pp. 4582–4591.

  • Xian, Y., Lorenz, T., Schiele, B., & Akata, Z. (2018). Feature generating networks for zero-shot learning. In Proceedings of CVPR, pp. 5542–5551.

  • Xu, X., Hospedales, T., & Gong, S. (2015). Semantic embedding space for zero-shot action recognition. In Proceedings of IEEE conference on image processing, pp. 63–67.

  • Xu, X., Hospedales, T., & Gong, S. (2017). Transductive zero-shot action recognition by word-vector embedding. International Journal of Computer Vision, 123(3), 309–333.

    Article  MathSciNet  Google Scholar 

  • Ye, M., & Guo, Y. (2017). Zero-shot classification with discriminative semantic representation learning. In Proceedings of CVPR, pp. 7140–7148.

  • Yu, Y., Ji, Z., Li, X., Guo, J., Zhang, Z., Ling, H., et al. (2017). Transductive zero-shot learning with a self-training dictionary approach. arXiv:1703.08893

  • Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. In Proceedings of British machine vision conference, pp. 87.1–87.12.

  • Zhang, L., Xiang, T., & Gong, S. (2017). Learning a deep embedding model for zero-shot learning. In Proceedings of CVPR, pp. 2021–2030.

  • Zhang, Z., & Saligrama, V. (2015). Zero-shot learning via semantic similarity embedding. In Proceedings of ICCV, pp. 4166–4174.

  • Zhang, Z., & Saligrama, V. (2016a). Zero-shot learning via joint latent similarity embedding. In Proceedings of CVPR, pp. 6034–6042.

  • Zhang, Z., & Saligrama, V. (2016b). Zero-shot recognition via structured prediction. In Proceedings of ECCV, pp. 533–548.

  • Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., et al. (2015). Conditional random fields as recurrent neural networks. Proceedings of CVPR, pp. 1529–1537.

  • Zhu, Y., Elhoseiny, M., Liu, B., & Elgammal, A. M. (2018). A generative adversarial approach for zero-shot learning from noisy texts. In Proceedings of CVPR, pp. 1004–1013.

Download references

Acknowledgements

This work was supported in part by National Key R&D Program of China (2018YFB1402600), National Natural Science Foundation of China (61976220, 61573026, 61832017), and Beijing Natural Science Foundation (L172037).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiwu Lu.

Additional information

Communicated by Cristian Sminchisescu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, A., Lu, Z., Guan, J. et al. Transferrable Feature and Projection Learning with Class Hierarchy for Zero-Shot Learning. Int J Comput Vis 128, 2810–2827 (2020). https://doi.org/10.1007/s11263-020-01342-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-020-01342-x

Keywords

Navigation