Skip to main content
Log in

Label Embedding for Multi-label Classification Via Dependence Maximization

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Multi-label classification has aroused extensive attention in various fields. With the emergence of high-dimensional label space, academia has devoted to performing label embedding in recent years. Whereas current embedding approaches do not take feature space correlation sufficiently into consideration or require an encoding function while learning embedded space. Besides, few of them can be spread to track the missing labels. In this paper, we propose a Label Embedding method via Dependence Maximization (LEDM), which obtains the latent space on which the label and feature information can be embedded simultaneously. To end this, the low-rank factorization model on the label matrix is applied to exploit label correlations instead of the encoding process. The dependence between feature space and label space is increased by the Hilbert–Schmidt independence criterion to facilitate the predictability. The proposed LEDM can be easily extended the missing labels in learning embedded space at the same time. Comprehensive experimental results on data sets validate the effectiveness of our approach over the state-of-art methods on both complete-label and missing-label cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Katakis I, Tsoumakas G, Vlahavas I (2008) Multilabel text classification for automated tag suggestion. In: Proceedings of the ECML/PKDD’18

  2. Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. Adv Neural Inf Proces Syst 14:681–687

    Google Scholar 

  3. Kong D, Ding CHQ, Huang H, Zhao H (2012) Multi-label relieff and f-statistic feature selections for image annotation. In: 2012 IEEE conference on computer vision and pattern recognition, pp 2352–2359

  4. Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. IJDWM 3(3):1–13

    Google Scholar 

  5. Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37(9):1757–1771

    Article  Google Scholar 

  6. Tsoumakas G, Vlahavas IP (2007) Random \(k\)-labelsets: an ensemble method for multilabel classification. In: European conference on machine learning, pp 406–417

  7. Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359

    Article  MathSciNet  Google Scholar 

  8. Zhang M, Zhou Z (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048

    Article  Google Scholar 

  9. Yoav F, Schapire R, Abe N (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(1612):771–780

    Google Scholar 

  10. Hsu DJ, Kakade S, Langford J, Zhang T (2009) Multi-label prediction via compressed sensing. In: Advances in neural information processing systems, pp 772–780

  11. Tai F, Lin H (2012) Multilabel classification with principal label space transformation. Neural Comput 24(9):2508–2542

    Article  MathSciNet  Google Scholar 

  12. Chen Y, Lin H (2012) Feature-aware label space dimension reduction for multi-label classification. In: Advances in neural information processing systems, pp 1538–1546

  13. Huang K, Lin H (2017) Cost-sensitive label embedding for multi-label classification. Mach Learn 106(9–10):1725–1746

    Article  MathSciNet  Google Scholar 

  14. Lin Z, Ding G, Han J, Shao L (2018) End-to-end feature-aware label space encoding for multilabel classification with many classes. IEEE Trans Neural Netw Learn Syst 29(6):2472–2487

    Article  MathSciNet  Google Scholar 

  15. Sun Y, Zhang Y, Zhou Z (2010) Multi-label learning with weak label. In: Proceedings of the twenty-fourth AAAI conference on artificial intelligence

  16. Gao N, Huang S, Chen S (2016) Multi-label active learning by model guided distribution matching. Front Comput Sci 10(5):845–855

    Article  Google Scholar 

  17. Wu B, Jia F, Liu W, Ghanem B, Lyu S (2018) Multi-label learning with missing labels using mixed dependency graphs. Int J Comput Vis 126(8):875–896

    Article  MathSciNet  Google Scholar 

  18. Bucak SS, Jin R, Jain AK (2011) Multi-label learning with incomplete class assignments. In: The 24th IEEE conference on computer vision and pattern recognition, pp 2801–2808

  19. Chen G, Song Y, Wang F, Zhang C (2008) Semi-supervised multi-label learning by solving a Sylvester equation. In: Proceedings of the SIAM international conference on data mining, pp 410–419

  20. Liu B, Li Y, Xu Z (2018) Manifold regularized matrix completion for multi-label learning with ADMM. Neural Netw 101:57–67

    Article  Google Scholar 

  21. Wu B, Liu Z, Wang S, Hu B, Ji Q (2014) Multi-label learning with missing labels. In: 22nd international conference on pattern recognition, pp 1964–1968

  22. Yu H, Jain P, Kar P, Dhillon IS (2014) Large-scale multi-label learning with missing labels. In: Proceedings of the 31th international conference on machine learning, pp 593–601

  23. Xu C, Tao D, Xu C (2016) Robust extreme multi-label learning. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1275–1284

  24. Ji S, Ye J (2009) An accelerated gradient method for trace norm minimization. In: Proceedings of the 26th annual international conference on machine learning, pp 457–464

  25. Cai J, Candès EJ, Shen Z (2010) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4):1956–1982

    Article  MathSciNet  Google Scholar 

  26. Zhu Y, Kwok JT, Zhou Z (2018) Multi-label learning with global and local label correlation. IEEE Trans Knowl Data Eng 30(6):1081–1094

    Article  Google Scholar 

  27. Guo B, Hou C, Shan J, Yi D (2018) Low rank multi-label classification with missing labels. In: 24th international conference on pattern recognition, pp 417–422

  28. Xu M, Jin R, Zhou Z (2013) Speedup matrix completion with side information: application to multi-label learning. In: Advances in neural information processing systems, pp 2301–2309

  29. Xu L, Wang Z, Shen Z, Wang Y, Chen E (2014) Learning low-rank label correlations for multi-label classification with missing labels. In: 2014 IEEE international conference on data mining, pp 1067–1072

  30. Zhao F, Guo Y (2015) Semi-supervised multi-label learning with incomplete labels. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence, pp 4062–4068

  31. Yang H, Zhou JT, Cai J (2016) Improving multi-label learning with missing labels by structured semantic correlations. In: 14th European conference on computer vision—ECCV 2016, pp 835–851

  32. Ren W, Zhang L, Jiang B, Wang Z, Guo G, Liu G (2017) Robust mapping learning for multi-view multi-label classification with missing labels. In: 10th international conference on knowledge science, engineering and management, pp 543–551

  33. Koren Y, Bell RM, Volinsky C (2009) Matrix factorization techniques for recommender systems. IEEE Comput 42(8):30–37. https://doi.org/10.1109/MC.2009.263

    Article  Google Scholar 

  34. Wen Z, Yin W, Zhang Y (2012) Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Math Program Comput 4(4):333–361

    Article  MathSciNet  Google Scholar 

  35. Song L, Smola AJ, Gretton A, Borgwardt KM, Bedo J (2007) Supervised feature selection via dependence estimation. In: Proceedings of the twenty-fourth international conference on machine learning, pp 823–830

  36. Fukumizu K, Bach FR, Jordan MI (2004) Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. J Mach Learn Res 5:73–99

    MathSciNet  MATH  Google Scholar 

  37. Yamanishi Y, Vert JP, Kanehisa M (2004) Heterogeneous data comparison and gene selection with kernel canonical correlation analysis. In: Kernel methods in computational biology, pp 209–229

  38. Bach FR, Jordan MI (2002) Kernel independent component analysis. J Mach Learn Res 3:1–48

    MathSciNet  MATH  Google Scholar 

  39. Gretton A, Herbrich R, Smola AJ (2003) The kernel mutual information. In: 2003 IEEE international conference on acoustics, pp 880–884

  40. Gretton A, Bousquet O, Smola AJ, Schölkopf B (2005) Measuring statistical dependence with Hilbert–Schmidt norms. In: 16th international conference on algorithmic learning theory, pp 63–77

  41. Gretton A, Fukumizu K, Teo CH, Song L, Schölkopf B, Smola AJ (2007) A kernel statistical test of independence. Adv Neural Inf Process Syst 20:585–592

    Google Scholar 

  42. Zhang X, Song L, Gretton A, Smola AJ (2008) Kernel measures of independence for non-iid data. In: Proceedings of the twenty-second annual conference on advances in neural information processing systems, Vancouver, British Columbia, Canada, 8–11 December 2008, vol 21, pp 1937–1944

  43. Devroye L, Györfi L, Lugosi G (2013) A probabilistic theory of pattern recognition, vol 31. Springer, Berlin

    MATH  Google Scholar 

  44. Wicker J, Pfahringer B, Kramer S (2012) Multi-label classification using boolean matrix decomposition. In: Proceedings of the ACM symposium on applied computing, pp 179–186

  45. Han S, Cao Q, Han M (2012) Parameter selection in SVM with RBF kernel function. World Autom Congr 2012:1–4

    Google Scholar 

  46. Lu Z, Ip HH, Peng Y (2011) Exhaustive and efficient constraint propagation: a semi-supervised learning perspective and its applications. CoRR arXiv:1109.4684

  47. Pacharawongsakda E, Theeramunkong T (2012) Towards more efficient multi-label classification using dependent and independent dual space reduction. In: 16th Pacific-Asia conference on advances in knowledge discovery and data mining, pp 383–394

  48. Han Y, Wu F, Jia J, Zhuang Y, Yu B (2010) Multi-task sparse discriminant analysis (NtSDA) with overlapping categories. In: Proceedings of the twenty-fourth AAAI conference on artificial intelligence

  49. Lehoucq RB, Sorensen DC (1996) Deflation techniques for an implicitly restarted Arnoldi iteration. SIAM J Matrix Anal Appl 17(4):789–821. https://doi.org/10.1137/S0895479895281484

    Article  MathSciNet  MATH  Google Scholar 

  50. Zhang M, Zhou Z (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837

    Article  Google Scholar 

  51. Zhou Z, Zhang M (2017) Multi-label learning. Springer US, New York, pp 875–881

    Google Scholar 

  52. Cao L, Xu J (2015) A label compression coding approach through maximizing dependence between features and labels for multi-label classification. In: 2015 International joint conference on neural networks, pp 1–8

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (61573266).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youlong Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Yang, Y. Label Embedding for Multi-label Classification Via Dependence Maximization. Neural Process Lett 52, 1651–1674 (2020). https://doi.org/10.1007/s11063-020-10331-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-020-10331-7

Keywords

Navigation