Skip to main content
Log in

A Dynamic Programming Framework for Large-Scale Online Clustering on Graphs

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

As a fundamental technique for data analysis, graph clustering grouping graph data into clusters has attracted great attentions in recent years. In this paper, we present DPOCG, a dynamic programming framework for large-scale online clustering on graphs, which improves the scalability of a wide range of graph clustering algorithms. Specifically, DPOCG first identifies the nodes whose states are unchanged compared with the states at the previous time on a large-scale graph, then constructs these unchanged nodes as supernodes, which greatly reduces the size of the graph at the current time, and collapses nodes whose degrees are less than a predefined threshold. Based on our density-based graph clustering algorithm (DGCM), DPOCG partitions the reduced graph into clusters. In addition, we theoretically analyze DPOCG in terms of supernode generation, clustering on reduced graph, and computational complexity. We evaluate DPOCG on a synthetic dataset and seven real-world datasets, respectively, and the experimental results show that DPOCG consumes less running time and improves the efficiency of clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473

    Article  Google Scholar 

  2. Leskovec J, Mcauley J (2012) Learning to discover social circles in ego networks. Adv Neural Inf Process 1–9

  3. Guimera R, Danon L, Diaz-Guilera A, Giralt F, Arenas A (2003) Self-similar community structure in a network of human interactions. Phys Rev E 68:65103

    Article  Google Scholar 

  4. Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Physical Rev E Statist Phys Plasmas Fluids Related Interdisciplinary Top 70(6 Pt 2):066111

    Google Scholar 

  5. Fortunato S (2010) Community detection in graphs

  6. Pothen A (1997) Graph partitioning algorithms with applications to scientific computing. Parallel Numer Algorithms 4:323–368

    Article  MathSciNet  Google Scholar 

  7. Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Techn J 49(2):291–307

    Article  Google Scholar 

  8. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on ..., vol 233, no 233, pp 281–297

  9. Nie F, Wang X, Huang H (2014) Clustering and projected clustering with adaptive neighbors. In: KDD

  10. Kaufman L, Rousseeuw PJ (2005) Finding groups in ordinal data. An introduction to cluster analysis, Wiley, New York

    Google Scholar 

  11. Von Luxburg U (2007) A tutorial on spectral clustering. Statist Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  12. Liu J, Wang C, Danilevsky M, Han J (2013) Large-scale spectral clustering on graphs. In: IJCAI International Joint Conference on Artificial Intelligence, pp 1486–1492

  13. Yin C, Zhao X, Mu S, Tian S (2013) A fast multiclass classification algorithm based on cooperative clustering. Neural Process Lett 38(3):389–402

    Article  Google Scholar 

  14. Wang R, Nie F, Yu W (2017) Fast spectral clustering with anchor graph for large hyperspectral images. IEEE Geosci Remote Sens Lett 14(11):2003–2007

    Article  Google Scholar 

  15. Zhou X, Liu Y, Wang J, Li C (2017) A density based link clustering algorithm for overlapping community detection in networks. Phys A Statist Mech Appl 486:65–78

    Article  Google Scholar 

  16. Seifollahi S, Bagirov A, Layton R, Gondal I (2017) Optimization based clustering algorithms for authorship analysis of phishing emails. Neural Process Lett 46(2):411–425

    Article  Google Scholar 

  17. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496

    Article  Google Scholar 

  18. Jin H, Wang S, Li C (2013) Community detection in complex networks by density-based clustering. Phys A Statist Mech Appl 392(19):4606–4618

    Article  Google Scholar 

  19. Karypis G, Kumar V (1998) Multilevel k-way partitioning scheme for irregular graphs. J Parallel Distributed Comput 48(1):96–129

    Article  Google Scholar 

  20. Yan D, Huang L, Jordan MI (2009) Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’09, p 907

  21. Chen WY, Song Y, Bai H, Lin CJ, Chang EY (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33(3):568–586

    Article  Google Scholar 

  22. Yang W, Xu H (2015) A divide and conquer framework for distributed graph clustering. In: Proceedings of the 32nd international conference on machine learning, pp 504–513

  23. Barabási A-L, Albert R (1999) Emergence of scaling in random networks. Science 286(October):509–512

    Article  MathSciNet  Google Scholar 

  24. Wang R, Nie F, Hong R, Chang X, Yang X, Yu W (2017) Fast and orthogonal locality preserving projections for dimensionality reduction. IEEE Trans Image Process 26(5439):5019–5030

    Article  MathSciNet  Google Scholar 

  25. Gleiser PM, Danon L (2003) Community structure in jazz. Adv Complex Syst 6(4):565–573

    Article  Google Scholar 

  26. Duch J, Arenas A (2005) Community detection in complex networks using extremal optimization. Phys Rev E 72(2):027104

    Article  Google Scholar 

  27. Ahn Y-Y, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466(7307):761–764

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 61672119.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yantao Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Zhao, X. & Qu, Z. A Dynamic Programming Framework for Large-Scale Online Clustering on Graphs. Neural Process Lett 52, 1613–1629 (2020). https://doi.org/10.1007/s11063-020-10329-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-020-10329-1

Keywords

Navigation