Skip to main content
Log in

Cross-modal subspace learning via kernel correlation maximization and discriminative structure-preserving

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

How to measure the distance between heterogeneous data is still an open problem. Many research works have been developed to learn a common subspace where the similarity between different modalities can be calculated directly. However, most of existing works focus on learning a latent subspace but the semantically structural information is not well preserved. Thus, these approaches cannot get desired results. In this paper, we propose a novel framework, termed Cross-modal subspace learning via Kernel correlation maximization and Discriminative structure-preserving (CKD), to solve this problem in two aspects. Firstly, we construct a shared semantic graph to make each modality data preserve the neighbor relationship semantically. Secondly, we introduce the Hilbert-Schmidt Independence Criteria (HSIC) to ensure the consistency between feature-similarity and semantic-similarity of samples. Our model not only considers the inter-modality correlation by maximizing the kernel correlation but also preserves the semantically structural information within each modality. The extensive experiments are performed to evaluate the proposed framework on the three public datasets. The experimental results demonstrate that the proposed CKD is competitive compared with the classic subspace learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Akaho S (2007) A kernel method for canonical correlation analysis. In: Proceedings of the International Meeting of the Psychometric Society

  2. Andrew G, Arora R, Bilmes J, et al (2013) Deep canonical correlation analysis[C]//International conference on machine learning. 1247–1255

  3. Chua TS, Tang J, Hong R, et al (2009) NUS-WIDE: a real-world web image database from National University of Singapore[C]//Proceedings of the ACM international conference on image and video retrieval. ACM, 48

  4. Ciocca G, Marini D, Rizzi A, et al (2003) Retinex preprocessing of uncalibrated images for color-based image retrieval[J]. J Elect Imaging 12(1):161–172

    Article  Google Scholar 

  5. Davis JV, Kulis B, Jain P, et al (2007) Information-theoretic metric learning[C]//Proceedings of the 24th international conference on Machine learning. ACM, 209–216

  6. Feng F, Wang X, Li R (2014) Cross-modal retrieval with correspondence autoencoder[C]//Proceedings of the 22nd ACM international conference on Multimedia. ACM, 7–16

  7. Gong Y, Ke Q, Isard M, et al (2012) A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics[J]. Int J Comput Vis 106 (2):210–233

    Article  Google Scholar 

  8. Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: An overview with application to learning methods[J]. Neural Comput 16(12):2639–2664

    Article  MATH  Google Scholar 

  9. Hu M, Yang Y, Shen F, et al (2019) Collective Reconstructive Embeddings for Cross-Modal Hashing[J]. IEEE Trans Image Process 28(6):2770–2784

    Article  MathSciNet  MATH  Google Scholar 

  10. Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation[C]//Proceedings of the 1st ACM international conference on Multimedia information retrieval. ACM, 39–43

  11. Jacobs DW, Daume H, Kumar A, et al (2012) Generalized Multiview analysis: A discriminative latent space[C]// 2012 IEEE Conference on Computer Vision and Pattern Recognition IEEE Computer Society

  12. Jia Y, Salzmann M, Darrell T (2011) Learning cross-modality similarity for multinomial data[C]//2011 International Conference on Computer Vision. IEEE, 2407–2414

  13. Jiang S, Song X, Huang Q (2014) Relative image similarity learning with contextual information for Internet cross-media retrieval[J]. Multi Syst 20(6):645–657

    Article  Google Scholar 

  14. Kim TK, Kittler J, Cipolla R (2007) Discriminative learning and recognition of image set classes using canonical correlations[J]. IEEE Trans Patt Anal Mach Intell 29(6):1005–1018

    Article  Google Scholar 

  15. Li Z, Tang J, Mei T (2018) Deep collaborative embedding for social image understanding[J] IEEE transactions on pattern analysis and machine intelligence

  16. Liangli Z, Peng H, Xu W, et al (2019) Deep Supervised Cross-modal Retrieval[C]//Proceedings of the IEEE conference on computer vision and pattern recognition

  17. Lin D, Tang X (2006) Inter-modality face recognition[C]//European conference on computer vision. Springer, Berlin, pp 13–26

    Google Scholar 

  18. Lisanti G, Masi I, DelBimbo A (2014) Matching people across camera views using kernel canonical correlation analysis[C]//Proceedings of the International Conference on Distributed Smart Cameras. ACM, 10

  19. Memon MH, Li JP, Memon I, et al (2017) GEO Matching regions: multiple regions of interests using content based image retrieval based on relative locations[J]. Multi Tools Appl 76(14):1–35

    Google Scholar 

  20. Ngiam J, Khosla A, Kim M, et al (2011) Multimodal deep learning[C]//Proceedings of the 28th international conference on machine learning (ICML-11). 689–696

  21. Nie F, Huang H, Cai X, et al (2010) Efficient and robust feature selection via joint 2,1-norms minimization[C]//Advances in neural information processing systems. 1813–1821

  22. Peng Y, Huang X, Qi J (2016) Cross-Media Shared Representation by Hierarchical Learning with Multiple Deep Networks[C]//IJCAI. 3846–3853

  23. Pereira JC, Coviello E, Doyle G, et al (2013) On the role of correlation and abstraction in cross-modal multimedia retrieval[J]. IEEE trans Patt Anal Mach Intell 36(3):521–535

    Article  Google Scholar 

  24. Principe JC (2010) Information theory, machine learning, and reproducing kernel Hilbert spaces[M]//Information theoretic learning. Springer, New York, pp 1–45

    Google Scholar 

  25. Ranjan V, Rasiwasia N, Jawahar CV (2015) Multi-label cross-modal retrieval[C]//Proceedings of the IEEE International Conference on Computer Vision. 4094–4102

  26. Rasiwasia N, Costa Pereira J, Coviello E et al (2010) A new approach to cross-modal multimedia retrieval[C]//Proceedings of the 18th ACM international conference on Multimedia. ACM, 251–260.

  27. Sharma A, Jacobs DW (2011) Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch[C]//CVPR 2011. IEEE, 593–600

  28. Shu X, Wu X (2011) A novel contour descriptor for 2D shape matching and its application to image retrieval[J]. Image Vision Comput 29(4):286–294

    Article  Google Scholar 

  29. Song G, Wang S, Huang Q, et al (2017) Multimodal similarity gaussian process latent variable model[J]. IEEE Trans Image Process 26(9):4168–4181

    Article  MathSciNet  MATH  Google Scholar 

  30. Song T, Cai J, Zhang T, et al (2017) Semi-supervised manifold-embedded hashing with joint feature representation and classifier learning[J]. Pattern Recogn 68:99–110

    Article  Google Scholar 

  31. Srivastava N, Salakhutdinov R (2012) Learning representations for multimodal data with deep belief nets[C]//International conference on machine learning workshop. 79

  32. Tenenbaum JB, Freeman WT (2000) Separating style and content with bilinear models[J]. Neural Comput 12(6):1247–1283

    Article  Google Scholar 

  33. Wang B, Yang Y, Xu X, et al (2017) Adversarial cross-modal retrieval[C]//Proceedings of the 25th ACM international conference on Multimedia. ACM, 154–162

  34. Wang D, Gao X, Wang X, et al (2018) Label consistent matrix factorization hashing for large-scale cross-modal similarity search[J] IEEE Transactions on Pattern Analysis and Machine Intelligence

  35. Wang D, Wang Q, Gao X (2017) Robust and flexible discrete hashing for Cross-Modal similarity Search[J]. IEEE Trans Circuits Syst Video Technol 1–1

  36. Wang H, Sahoo D, Liu C, et al (2019) Learning Cross-Modal Embeddings with Adversarial Networks for Cooking Recipes and Food Images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11572–11581

  37. Wang K, He R, Wang L, et al (2015) Joint feature selection and subspace learning for cross-modal retrieval[J]. IEEE Trans Patt Anal Mach Intell 38(10):2010–2023

    Article  MathSciNet  Google Scholar 

  38. Wei Y, Zhao Y, Lu C, et al (2017) Cross-modal retrieval with CNN visual features: A new baseline[J]. IEEE Trans Cyber 47(2):449–460

    Google Scholar 

  39. Xu M, Zhu Z, Zhao Y, et al (2018) Subspace learning by kernel dependence maximization for cross-modal retrieval[J]. Neurocomputing 309:94–105

    Article  Google Scholar 

  40. Xu X, et al (2017) Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval. IEEE Trans Image Process 26(5):2494–2507

    Article  MathSciNet  MATH  Google Scholar 

  41. Yu J, Wu X, Kittler J (2018) Semi-supervised Hashing for Semi-Paired Cross-View Retrieval, 2018 24th International Conference on Pattern Recognition (ICPR), Beijing 958–963

  42. Yu J, Wu XJ, Kittler J (2019) Discriminative Supervised Hashing for Cross-Modal Similarity Search[J]. Image Vision Comput 89:50–56

    Article  Google Scholar 

  43. Zhang C, Wang X, Feng J, et al (2017) A car-face region-based image retrieval method with attention of SIFT features[J]. Multi Tools Appl 76(8):1–20

    Google Scholar 

  44. Zheng L, Wang S, Tian Q (2014) Lp-norm IDF for Scalable Image Retrieval[J]. Image Process IEEE Trans On 23(8):3604–3617

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The paper is supported by the national natural science foundation of china(grant no.61672265,u1836218), and the 111 project of ministry of education of china (grant no. b12018).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao-Jun Wu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, J., Wu, XJ. Cross-modal subspace learning via kernel correlation maximization and discriminative structure-preserving. Multimed Tools Appl 79, 34647–34663 (2020). https://doi.org/10.1007/s11042-020-08989-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-08989-1

Keywords

Navigation