Skip to main content
Log in

Hierarchical high-order co-clustering algorithm by maximizing modularity

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

The star-structured high-order heterogeneous data is ubiquitous, such data represent objects of a certain type, connected to other types of data, or the features, so that the overall data schema forms a star-structure of inter-relationships. In this paper, we study the problem of co-clustering of star-structured high-order heterogeneous data. We present a new solution, a Hierarchical High-order Co-clustering Algorithm by Maximizing Modularity, MHCoC, which iteratively optimizes the objective function based on modularity and finally converges to a unique clustering result. In contrast to the traditional co-clustering methods, MHCoC merges information of multiple feature spaces of high-order heterogeneous data. Moreover, MHCoC takes a top-down strategy to perform a greedy divisive procedure, generating a tree-like hierarchical clustering result that reveal the relationship between clusters. To illustrate the process in more detail, we design a toy example to describe how MHCoC selects the appropriate co-cluster and splits it. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Ailem M, Role F, Nadif M Co-clustering document-term matrices by direct maximization of graph modularity. In: Proc. 24th ACM Int. Conf. Inf. Knowl. Manage, 2015 pp 1807–1s810.

  2. Wang Y, Feng C, Guo C, Chu Y, Hwang J (2019) Solving the sparsity problem in recommendations via cross-domain item embedding based on co-clustering. In: Proc. 12th ACM Int. Conf. Web Search Data Mining, 717–725.

  3. Feng L, Zhao Q, Zhou C (2020) Improving performances of Top-N recommendations with co-clustering method, Expert Syst. Appl., 143.

  4. Chen X, Huang JZ, Wu Q, Yang M (2019) Subspace weighting co-clustering of gene expression data. IEEE/ACM Trans Comput Biol Bioinform 16(2):352–364

    Article  Google Scholar 

  5. Hussain SF, Iqbal S (2018) CCGA: Co-similarity based Co-clustering using genetic algorithm. Appl Soft Comput 72:30–42

    Article  Google Scholar 

  6. Keuper M, Tang S, Andres B, Brox T, Schiele B (2020) Motion segmentation & multiple object tracking by correlation co-clustering. IEEE Trans Pattern Anal Mach Intell 42(1):140–153

    Article  Google Scholar 

  7. Meng L, Tan A, Xu D (2014) Semi-supervised heterogeneous fusion for multimedia data co-clustering. IEEE Trans Knowl Data Eng 26(9):2293–2306

    Article  Google Scholar 

  8. Cheng W, Zhang X, Pan F, Wang W (2016) HICC: an entropy splitting-based framework for hierarchical co-clustering. Knowl Inf Syst 46(2):343–367

    Article  Google Scholar 

  9. Ienco D, Robardet C, Pensa RG, Meo R (2013) Parameter-less co-clustering for star-structured heterogeneous data. Data Min Knowl Discov 26(2):217–254

    Article  MathSciNet  Google Scholar 

  10. Yin M, Gao J, Xie S, Guo Y (2019) Multiview subspace clustering via tensorial t-product representation. IEEE Trans Neural Netw Learning Syst 30(3):851–864

    Article  MathSciNet  Google Scholar 

  11. Huang L, Chao H, Wang C (2019) Multi-view intact space clustering. Pattern Recogn 86:344–353

    Article  Google Scholar 

  12. Yin M, Gao J, Xie S, Guo Y (2020) Auto-weighted multi-view co-clustering with bipartite graphs. Inf Sci 512:18–30

    Article  MathSciNet  Google Scholar 

  13. Gao B, Liu T, Zheng X, Cheng Q, Ma W (2005) Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In: Proc. 11th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pp 41–50.

  14. Gao B, Liu T, Ma W (2006) Star-structured high-order heterogeneous data co-clustering based on consistent information theory, In: Proc. 6th ACM Int. Conf. Data Mining, pp 880–884.

  15. Chen Y, Wang L, Dong M (2009) Non-negative matrix factorization for semi-supervised heterogeneous data co-clustering. IEEE Trans Knowl Data Eng 22(10):1459–1474

    Article  Google Scholar 

  16. Wang S, Guo W (2017) Robust co-clustering via dual local learning and high-order matrix factorization, Knowl.-Based Syst., 138:176–17.

  17. Xu D, Cheng W, Zong B, Ni J, Song D, Yu W, Chen Y, Chen H, Zhang X (2019) Deep Co-Clustering. In: Proc. SIAM Int. Conf. Data Mining, pp 414–422.

  18. Papalexakis EE, Sidiropoulos ND, Bro R (2013) From K -means to higher-way co-clustering: multilinear decomposition with sparse latent factors. IEEE Trans. Signal Processing 61(2):493–506

    Article  Google Scholar 

  19. Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proc. 7th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pp 269–274.

  20. Li T (2005) A general model for clustering binary data, In: Proc. 11th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pp 188–197.

  21. Labiod L, Nadif M (2011) Co-clustering for binary and categorical data with maximum modularity. In: Proc. 11th ACM Int. Conf. Data Mining, pp 1040–1045.

  22. Han H, Dong X, Zuo C (2020) A weighted recommendation algorithm based on multiview clustering of user. J Intell Fuzzy Syst 38(1):441–451

    Article  Google Scholar 

  23. Li J, Wang C, Li P, Lai J (2018) Discriminative metric learning for multi-view graph partitioning. Pattern Recogn 75:199–213

    Article  Google Scholar 

  24. Kim Y, Amini M, Goutte C, Gallinari P (2010) Multi-view clustering of multilingual documents, In: Proc. Int. ACM SIGIR Conf. Res. Develop. Inf. Retr., pp 821–822.

  25. Zhang M, Yang Y, Shen F, Zhang H, Wang Y (2017) Multi-view feature selection and classification for Alzheimer’s disease diagnosis. Multimedia Tools Appl 76(8):10761–10775

    Article  Google Scholar 

  26. Zhan K, Chang X, Guan J, Chen L, Ma Z, Yang Y (2019) Adaptive structure discovery for multimedia analysis using multiple features. IEEE Trans Cybernetics 49(5):1826–1834

    Article  Google Scholar 

  27. Gao B, Liu T, Feng G, Qin T, Cheng Q, Ma W (2005) Hierarchical taxonomy preparation for text categorization using consistent bipartite spectral graph co-partitioning. IEEE Trans Knowl Data Eng 19(7):1263–1273

    Google Scholar 

  28. Greco G, Guzzo A, Pontieri L (2010) Coclustering multiple heterogeneous domains: linear combinations and agreements. IEEE Trans Knowl Data Eng 22(12):1649–1663

    Article  Google Scholar 

  29. Sun Y, Yu Y, Han J (2009) Ranking-based clustering of heterogeneous information networks with star network schema, In: Proc. 15th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pp 797–806.

  30. Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(8):841–847

    Article  Google Scholar 

  31. Chen Y, Wang L, Dong M (2010) Non-negative matrix factorization for semi-supervised heterogeneous data co-clustering. IEEE Trans Knowl Data Eng 22(10):1459–1474

    Article  Google Scholar 

  32. Cohen-Addad V, Kanade V, Mallmann-Trenn F, Mathieu C (2019) Hierarchical clustering: objective functions and algorithms, J ACM 66(4):26:1–26:42.

  33. Charikar M, Chatziafratis V, Niazadeh R (2019) Hierarchical clustering better than average-linkage. In: Proc. 15th ACM-SIAM Symp. on Dis. Algor., pp 2291–2304.

  34. Emmendorfer LR, Canuto AM (2021) A generalized average linkage criterion for hierarchical agglomerative clustering. Appl Soft Comput 100:106990

    Article  Google Scholar 

  35. Shi P, Zhao Z, Zhong H, Zhong H, Shen H, Ding L (2021) An improved agglomerative hierarchical clustering anomaly detection method for scientific data. Concurrency and computation, Practice and Experience, 33(6):e6077

  36. Diez I, Bonifazi P, Escudero I, Mateos B, Muñoz MA, Stramaglia S, Cortes JM (2015) A novel brain partition highlights the modular skeleton shared by structure and function. Sci Reports 5:10532

    Google Scholar 

  37. Hu M, Zeng K, Wang Y et al (2021) Threshold-based hierarchical clustering for person re-identification. Entropy 23(5):522

    Article  Google Scholar 

  38. Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering, In: Proc. 9th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pp 89–98.

  39. Tan Q, Yang P, He J (2018) Feature co-shrinking for co-clustering. Pattern Recogn 77:12–19

    Article  Google Scholar 

  40. Wang H, Nie F, Huang H, Ding CHQ (2011) Nonnegative matrix tri-factorization based high-order co-clustering and its fast implementation. In: Proc. 11th ACM Int. Conf. Data Mining, 2011, pp 774–783.

  41. Slonim N, Tishby N (2000) Document clustering using word clusters via the information bottleneck method, In: Proc. 23rd Int. ACM SIGIR Conf. Res. Develop. Inf. Retr., pp 208–215.

Download references

Acknowledgements

The work is supported by the National Natural Science Foundation of China (No.61762078, 61363058, 61762079), Guangxi Key Laboratory of Trusted Software (No. kx201910) and Research Fund of Guangxi Key Lab of Multi-source Information Mining & Security (MIMS18-08)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huifang Ma.

Ethics declarations

Conflict of interest

No conflicts of interests.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, J., Ma, H., Liu, Y. et al. Hierarchical high-order co-clustering algorithm by maximizing modularity. Int. J. Mach. Learn. & Cyber. 12, 2887–2898 (2021). https://doi.org/10.1007/s13042-021-01375-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-021-01375-9

Keywords

Navigation