Skip to main content
Log in

An experimental study of graph-based semi-supervised classification with additional node information

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The volume of data generated by internet and social networks is increasing every day, and there is a clear need for efficient ways of extracting useful information from them. As this information can take different forms, it is important to use all the available data representations for prediction; this is often referred to multi-view learning. In this paper, we consider semi-supervised classification using both regular, plain, tabular, data and structural information coming from a network structure (feature-rich networks). Sixteen techniques are compared and can be divided in three families: the first one uses only the plain features to fit a classification model, the second uses only the network structure, and the last combines both information sources. These three settings are investigated on 10 real-world datasets. Furthermore, network embedding and well-known autocorrelation indicators from spatial statistics are also studied. Possible applications are automatic classification of web pages or other linked documents, of nodes in a social network, or of proteins in a biological complex system, to name a few. Based on our findings, we draw some general conclusions and advice to tackle this particular classification task: it is clearly observed that some dataset labelings can be better explained by their graph structure or by their features set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Graph and network will be used interchangeably.

  2. Recall that autocorrelation means that neighboring nodes tend to take similar values.

  3. Hence the name autologistic.

  4. The datasets are available at http://github.com/B-Lebichot/Research.

References

  1. Abney S (2008) Semisupervised learning for computational linguistics. Chapman and Hall/CRC, Boca Raton

    MATH  Google Scholar 

  2. Akamatsu T (1996) Cyclic flows, Markov process and stochastic traffic assignment. Transp Res B 30(5):369–386

    Google Scholar 

  3. Anselin L (1988) Spatial econometrics: methods and models. Kluwer Academic Press, New York

    MATH  Google Scholar 

  4. Augustin NH, Mugglestone MA, Buckland ST (1996) An autologistic model for the spatial distribution of wildlife. J Appl Ecol 33(2):339–347

    Google Scholar 

  5. Augustin NH, Mugglestone MA, Buckland ST (1998) The role of simulation in modelling spatially correlated data. Environmetrics 9(2):175–196

    Google Scholar 

  6. Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from examples. J Mach Learn Res 7:2399–2434

    MathSciNet  MATH  Google Scholar 

  7. Benali H, Escofier B (1990) Analyse factorielle lissee et analyse des differences locales. Revue de Statistique Appliquee 38(2):55–76

    Google Scholar 

  8. Besag JE (1972) Nearest-neighbour systems and the auto-logistic model for binary data. J R Stat Soc Ser B (Methodol) 34(1):75–83

    MathSciNet  MATH  Google Scholar 

  9. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory, COLT’ 98, pp 92–100. ACM, New York

  10. Borcard D, Legendre P (2002) All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices. Ecol Model 153(1–2):51–68

    Google Scholar 

  11. Bottou L, Lin CJ (2007) Support vector machine solvers. In: Bottou L et al (eds) Large scale kernel machines. MIT Press, Cambridge, pp 1–28

    Google Scholar 

  12. Chapelle O, Scholkopf B, Zien A (eds) (2006) Semi-supervised learning. MIT Press, Cambridge

    Google Scholar 

  13. Chen D, Cheng X (2001) An asymptotic analysis of some expert fusion methods. Pattern Recognit Lett 22:901–904

    MATH  Google Scholar 

  14. Chung FR (1997) Spectral graph theory. American Mathematical Society, Providence

    MATH  Google Scholar 

  15. Cooke RM (1991) Experts in uncertainty. Oxford University Press, Oxford

    Google Scholar 

  16. Courtain S, Lebichot B, Kivimaki I, Saerens M (2019) Graph-based fraud detection with the free energy distance. In: Proceedings of the 8th international conference on complex networks and their applications (complex networks 2019). Springer, pp 40–52

  17. de Jong P, Sprenger C, van Veen F (1984) On extreme values of Moran’s I and Geary’s c. Geogr Anal 16(1):17–24

    Google Scholar 

  18. Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc B 39(1):1–38

    MATH  Google Scholar 

  19. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  20. Devooght R, Mantrach A, Kivimaki I, Bersini H, Jaimes A, Saerens M (2014) Random walks based modularity: application to semi-supervised learning. In: Proceedings of the 23rd international conference on World Wide Web, WWW ’14, pp 213–224

  21. Dray S, Legendre P, Peres-Neto P (2006) Spatial modelling: a comprehensive framework for principal coordinate analysis of neighbour matrices. Ecol Model 196(3–4):483–493

    Google Scholar 

  22. Dubois D, Grabisch M, Prade H, Smets P (1999) Assessing the value of a candidate: comparing belief function and possibility theories. In: Proceedings of the 15th international conference on uncertainty in artificial intelligence, pp 170–177

  23. Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874

    MATH  Google Scholar 

  24. Fouss F, Francoisse K, Yen L, Pirotte A, Saerens M (2012) An experimental investigation of kernels on a graph on collaborative recommendation and semisupervised classification. Neural Netw 31:53–72

    MATH  Google Scholar 

  25. Fouss F, Pirotte A, Renders JM, Saerens M (2007) Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation. IEEE Trans Knowl Data Eng 19(3):355–369

    Google Scholar 

  26. Fouss F, Saerens M (2004) Yet another method for combining classifiers outputs: a maximum entropy approach. In: Proceedings of the 5th international workshop on multiple classifier systems (MCS 2004), lecture notes in computer science, vol 3077. Springer, pp 82–91

  27. Fouss F, Saerens M, Shimbo M (2016) Algorithms and models for network data and link analysis. Cambridge University Press, Cambridge

    Google Scholar 

  28. Francoisse K, Kivimaki I, Mantrach A, Rossi F, Saerens M (2017) A bag-of-paths framework for network data analysis. Neural Netw 90:90–111

    MATH  Google Scholar 

  29. Gammerman A, Vapnik V, Vowk V (1998) Learning by tranduction. In: Proceedings of the 14th conference on uncertainty in artificial intelligence. Wisconsin, pp 273–297

  30. Gartner T (2008) Kernels for structured data. World Scientific Publishing, Singapore

    MATH  Google Scholar 

  31. Geary RC (1954) The contiguity ratio and statistical mapping. Incorp Stat 5(3):115–146

    Google Scholar 

  32. Gómez-Chova L, Camps-Valls G, Munoz-Mari J, Calpe J (2008) Semisupervised image classification with Laplacian support vector machines. IEEE Geosci Remote Sens Lett 5(3):336–340

    Google Scholar 

  33. Green P, Silverman B (1994) Nonparametric regression and generalized linear models. A roughness penalty approach. Chapman & Hall, London

    MATH  Google Scholar 

  34. Haining R (2003) Spatial data analysis. Cambridge University Press, Cambridge

    Google Scholar 

  35. Hardoon DR, Szedmak SR, Shawe-taylor JR (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664

    MATH  Google Scholar 

  36. He X (2010) Laplacian regularized d-optimal design for active learning and its application to image retrieval. IEEE Trans Image Process 19(1):254–263

    MathSciNet  MATH  Google Scholar 

  37. Hill S, Provost F, Volinsky C (2006) Network-based marketing: identifying likely adopters via consumer networks. Stat Sci 21(2):256–276

    MathSciNet  MATH  Google Scholar 

  38. Hofmann T, Schölkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36(3):1171–1220

    MathSciNet  MATH  Google Scholar 

  39. Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425

    Google Scholar 

  40. Jacobs RA (1995) Methods for combining experts’ probability assessments. Neural Comput 7:867–888

    Google Scholar 

  41. Jiang X, Gold D, Kolaczyk E (2011) Network-based auto-probit modeling for protein function prediction. Biometrics 67(3):958–966

    MathSciNet  MATH  Google Scholar 

  42. Johnson R, Wichern D (2007) Applied multivariate statistical analysis, 6th edn. Prentice Hall, Upper Saddle River

    MATH  Google Scholar 

  43. Kittler J, Alkoot FM (2003) Sum versus vote fusion in multiple classifier systems. IEEE Trans Pattern Anal Mach Intell 25(1):110–115

    Google Scholar 

  44. Klir GJ, Folger TA (1988) Fuzzy sets, uncertainty, and information. Prentice-Hall, Upper Saddle River

    MATH  Google Scholar 

  45. Kolaczyk ED (2009) Statistical analysis of network data: methods and models. Springer, Berlin

    MATH  Google Scholar 

  46. Kuncheva L (2004) Combining pattern classifiers: methods and algorithms. Wiley, Hoboken

    MATH  Google Scholar 

  47. Lad F (1996) Operational subjective statistical methods. Wiley, Hoboken

    MATH  Google Scholar 

  48. Lebart L (2000) Contiguity analysis and classification. In: Gaul W, Opitz O, Schader M (eds) Data analysis, studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 233–243

    Google Scholar 

  49. Lebichot B, Braun F, Caelen O, Saerens M (2016) A graph-based, semi-supervised, credit card fraud detection system. In: Proceedings of the 5th international workshop on complex networks and their applications (complex networks 2016). Springer, pp 721–733

  50. Lebichot B, Kivimaki I, Françoisse K, Saerens M (2014) Semi-supervised classification through the bag-of-paths group betweenness. IEEE Trans Neural Netw Learn Syst 25:1173–1186

    Google Scholar 

  51. LeSage J, Pace RK (2009) Introduction to spatial econometrics. Chapman & Hall, London

    MATH  Google Scholar 

  52. Levy WB, Delic H (1994) Maximum entropy aggregation of individual opinions. IEEE Trans Syst Man Cybern 24(4):606–613

    MathSciNet  MATH  Google Scholar 

  53. Lu Q, Getoor L (2001) Link-based classification. In: Proceedings of the 20th international conference on machine learning (ICML 2003), pp 496–503

  54. Macskassy SA, Provost F (2007) Classification in networked data: a toolkit and a univariate case study. J Mach Learn Res 8:935–983

    Google Scholar 

  55. Mantrach A, van Zeebroeck N, Francq P, Shimbo M, Bersini H, Saerens M (2011) Semi-supervised classification and betweenness computation on large, sparse, directed graphs. Pattern Recognit 44(6):1212–1224

    MATH  Google Scholar 

  56. Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, New York

    MATH  Google Scholar 

  57. McAuley J, Leskovec J (2012) Learning to discover social circles in ego networks. Advances in neural information processing systems (NIPS 25), pp 539–547

  58. McLachlan G, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, Hoboken

    MATH  Google Scholar 

  59. Meot A, Chessel D, Sabatier R (1993) Operateurs de voisinage et analyse des donnees spatio-temporelles (in french). In: Lebreton D, Asselain B (eds) Biometrie et environnement. Masson, Paris, pp 45–72

    Google Scholar 

  60. Merz C (1999) Using correspondence analysis to combine classifiers. Mach Learn 36:226–239

    Google Scholar 

  61. Moran P (1948) The interpretation of statistical maps. J R Stat Soc B 10:243–251

    MathSciNet  MATH  Google Scholar 

  62. Moran P (1950) Notes on continuous stochastic phenomena. Biometrika 37(1/2):17–23

    MathSciNet  MATH  Google Scholar 

  63. Mulders D, de Bodt C, Bjelland J, Pentland A, Verleysen M, de Montjoye Y (2019) Inference of node attributes from social network assortativity. Neural Comput Appl 1433–3058:1–21

    Google Scholar 

  64. Myung IJ, Ramamoorti S, Andrew D, Bailey J (1996) Maximum entropy aggregation of expert predictions. Manag Sci 42(10):1420–1436

    MATH  Google Scholar 

  65. Newman M (2006) Modularity and community structure in networks. Proc Natl Acad Sci U S A 103(23):8577–8582

    Google Scholar 

  66. Newman M (2018) Networks: an introduction, 2nd edn. Oxford University Press, Oxford

    Google Scholar 

  67. Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113

    Google Scholar 

  68. Pawitan Y (2001) In all likelihood: statistical modelling and inference using likelihood. Oxford University Press, Oxford

    MATH  Google Scholar 

  69. Pfeiffer D, Robinson T, Stevenson M, Stevens K, Rogers D, Clements A (2008) Spatial analysis in epidemiology. Oxford University Press, Oxford

    MATH  Google Scholar 

  70. Prithviraj S, Galileo G, Bilgic M, Getoor L, Gallagher B, Eliassi-Rad T (2008) Collective classification in network data. AI Mag 29(3):93–106

    Google Scholar 

  71. Roth V (2001) Probabilistic discriminative kernel classifiers for multi-class problems. In: Radig B, Florczyk S (eds) Pattern recognition: proceedings of the 23rd DAGM symposium, lecture notes in computer science, vol 2191. Springer, Berlin, pp 246–253

    Google Scholar 

  72. Saerens M, Achbany Y, Fouss F, Yen L (2009) Randomized shortest-path problems: two related models. Neural Comput 21(8):2363–2404

    MathSciNet  MATH  Google Scholar 

  73. Scholkopf B, Smola A (2002) Learning with kernels. The MIT Press, Cambridge

    MATH  Google Scholar 

  74. Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  75. Silva T, Zhao L (2016) Machine learning in complex networks. Springer, Berlin

    MATH  Google Scholar 

  76. Subramanya A, Pratim Talukdar P (2014) Graph-based semi-supervised learning. Morgan & Claypool Publishers, San Rafael

    MATH  Google Scholar 

  77. Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23:2031–2038

    Google Scholar 

  78. Tang L, Liu H (2009) Relational learning via latent social dimensions. In: Proceedings of the ACM conference on knowledge discovery and data mining (KDD 2009), pp 817–826

  79. Tang L, Liu H (2009) Scalable learning of collective behavior based on sparse social dimensions. In: Proceedings of the ACM conference on information and knowledge management (CIKM 2009), pp 1107–1116

  80. Tang L, Liu H (2010) Toward predicting collective behavior via social dimension extraction. IEEE Intell Syst 25(4):19–25

    Google Scholar 

  81. Van Vlasselaer V, Bravo C, Caelen O, Eliassi-Rad T, Akogu L, Snoeck M, Baesens B (2015) APATE: a novel approach for automated credit card transaction fraud detection using network-based extensions. Decis Support Syst 75:38–48

    Google Scholar 

  82. von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    MathSciNet  Google Scholar 

  83. Waldhor T (2006) Moran’s spatial autocorrelation coefficient. In: Kotz S, Balakrishnana N, Read C, Vidakovic B, Johnson N (eds) Encyclopedia of statistical sciences, vol 12, 2nd edn. Wiley, Hoboken, pp 7875–7878

    Google Scholar 

  84. Waller L, Gotway C (2004) Applied spatial statistics for public health data. Wiley, Hoboken

    MATH  Google Scholar 

  85. Zhang D, Mao R (2008) Classifying networked entities with modularity kernels. In: Proceedings of the 17th ACM conference on information and knowledge management (CIKM 2008). ACM, pp 113–122

  86. Zhao J, Xie X, Xu X, Sun S (2017) Multi-view learning overview: recent progress and new challenges. Inf Fusion 38(C):43–54

    Google Scholar 

  87. Zhou D, Bousquet O, Lal T, Weston J, Scholkopf B (2003) Learning with local and global consistency. In: Proceedings of the neural information processing systems conference (NIPS 2003), pp 237–244

  88. Zhu X (2008) Semi-supervised learning literature survey. Unpublished manuscript from the Computer Science Department of the University of Wisconsin-Madison. http://pages.cs.wisc.edu/~jerryzhu/research/ssl/semireview.html

  89. Zhu X, Goldberg A (2009) Introduction to semi-supervised learning. Morgan & Claypool Publishers, San Rafael

    MATH  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the Elis-IT project funded by the “Région wallonne” and the Brufence project supported by INNOVIRIS (“Région bruxelloise”), Belgium. We thank this institution for giving us the opportunity to conduct both fundamental and applied research. We also thank the anonymous reviewers for their relevant remarks and suggestions that helped us to improve significantly the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bertrand Lebichot.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lebichot, B., Saerens, M. An experimental study of graph-based semi-supervised classification with additional node information. Knowl Inf Syst 62, 4337–4371 (2020). https://doi.org/10.1007/s10115-020-01500-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-020-01500-0

Keywords

Navigation