Skip to main content

Advertisement

Log in

Fast Subgraph Matching Strategies Based on Pattern-Only Heuristics

  • Original research article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Many scientific applications entail solving the subgraph isomorphism problem, i.e., given an input pattern graph, find all the subgraphs of a (usually much larger) target graph that are structurally equivalent to that input. Because subgraph isomorphism is NP-complete, methods to solve it have to use heuristics. This work evaluates subgraph isomorphism methods to assess their computational behavior on a wide range of synthetic and real graphs. Surprisingly, our experiments show that, among the leading algorithms, certain heuristics based only on pattern graphs are the most efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.internationalgenome.org/.

  2. http://www.encodeproject.org.

  3. https://cancergenome.nih.gov/.

References

  1. Mashaghi AR, Ramezanpour A, Karimipour V (2004) Investigation of a protein complex network. Eur Phys J B Condens Matter Complex Syst 41(1):113–121

    Article  CAS  Google Scholar 

  2. Li S, Armstrong CM, Bertin N, Ge H, Milstein S et al (2004) A map of the interactome network of the Metazoan C. elegans. Science 303(5657):540–543

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  3. Faccioli P, Provero P, Herrmann C, Stanca AM, Morcia C, Terzi V (2005) From single genes to co-expression networks: Extracting knowledge from barley functional genomics. Plant Mol Biol 58(5):739–750

    Article  PubMed  CAS  Google Scholar 

  4. Gerstein M B, Kundaje A, Hariharan M, Landt S G, Yan KK, Cheng C, Mu et al (2012) Architecture of the human regulatory network derived from ENCODE data. Nature 489(7414):91–100

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  5. McCall MN (2013) Estimation of gene regulatory networks. J Postdr Res 1(1):60–69

    Google Scholar 

  6. Christensen C, Thakar J, Albert R (2007) Systems-level insights into cellular regulation: inferring, analysing, and modelling intracellular networks. IET Syst Biol 1(2):61–77

    Article  PubMed  CAS  Google Scholar 

  7. Terzer M, Maynard ND, Covert MW, Stelling J (2009) Genome-scale metabolic networks. Wiley Interdiscip Rev Syst Biol Med 1(3):285–297

    Article  PubMed  CAS  Google Scholar 

  8. Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabási A-L (2002) Hierarchical organization of modularity in metabolic networks. Science 297(5586):1551–1555

    Article  PubMed  CAS  Google Scholar 

  9. Redestig H, Szymanski J, Hirai MY, Selbig J, Willmitzer L, Nikoloski Z, Saito K (2018) Data integration, metabolic networks and systems biology, chapter 9. American Cancer Society, Atlanta, pp 261–316

    Google Scholar 

  10. Janjic V, Przulj N (2012) Biological function through network topology: a survey of the human diseasome. Brief Funct Genom 11(6):522–532

    Article  CAS  Google Scholar 

  11. Goh KI, Choi IG (2012) Exploring the human diseasome: the human disease network. Brief Funct Genom 11(6):533–542

    Article  CAS  Google Scholar 

  12. Wysocki K, Ritter L (2011) Diseasome: an approach to understanding gene-disease interactions. Annu Rev Nurs Res 29:55–72

    Article  PubMed  Google Scholar 

  13. Suvarna Vani K, Praveen Kumar K (2018) Feature Extraction of protein contact maps from protein 3D-coordinates. In: Mishra D K, Azar A T, Joshi A (eds) Information and communication technology. Springer, Singapore, pp 311–320

    Chapter  Google Scholar 

  14. Hu J, Shen X, Shao Y, Bystroff C, Zaki M J (2002) Mining protein contact maps. In: Proceedings of the 2Nd international conference on data mining in bioinformatics, BIOKDD’02, London, UK. Springer, pp 3–10

  15. Bader GD, Cary MP, Sander C (2006) Pathguide: a pathway resource list. Nucleic Acids Res 34(suppl1):D504–D506

    Article  PubMed  CAS  Google Scholar 

  16. Cerami EG, Gross BE, Demir E, Rodchenkov I, Babur A, Anwar N, Schultz N, Bader GD, Sander C (2011) Pathway commons, a web resource for biological pathway data. Nucleic Acids Res 39(suppl1):D685–D690

    Article  PubMed  CAS  Google Scholar 

  17. Chatr-aryamontri A, Oughtred R, Boucher L and J. et al (2017) Rust. The BioGRID interaction database: 2017 update. Nucleic Acids Res 45(D1):d369–d379. Exported from https://app.dimensions.aion2018/08/18

  18. Bonnici V, Russo F, Bombieri N, Pulvirenti A, Giugno R (2014) Comprehensive reconstruction and visualization of non-coding regulatory networks in human. Front Bioeng Biotechnol 2:69

    Article  PubMed  PubMed Central  Google Scholar 

  19. Turkarslan S, Wurtmann EJ, Wu WJ, Jiang N et al (2014) Network portal: a database for storage, analysis and visualization of biological networks. Nucleic Acids Res 42(D1):D184–D190

    Article  PubMed  CAS  Google Scholar 

  20. Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell’s functional organization. Nat Rev Genet 5(2):101–113

    Article  PubMed  CAS  Google Scholar 

  21. Yu D, Kim M, Xiao G, Hwang TH (2013) Review of biological network data and its applications. Genom Inform 11(4):200–210

    Article  Google Scholar 

  22. Csermely P, Korcsmaros T, Kiss HJ, London G, Nussinov R (2013) Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review. Pharmacol Ther 138(3):333–408

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  23. Barabasi AL, Gulbahce N, Loscalzo J (2011) Network medicine: a network-based approach to human disease. Nat Rev Genet 12(1):56–68

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. Giuliani A, Filippi S, Bertolaso M (2014) Why network approach can promote a new way of thinking in biology. Front Genet 5:83

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Micale G, Giugno R, Ferro A, Mongiovì M, Shasha D, Pulvirenti A (2018) Fast analytical methods for finding significant labeled graph motifs. Data Min Knowl Discov 32(2):504–531

    Article  Google Scholar 

  26. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827

    Article  PubMed  CAS  Google Scholar 

  27. Palsson B, Zengler K (2010) The challenges of integrating multi-omic data sets. Nat Chem Biol 6:787

    Article  PubMed  Google Scholar 

  28. Przulj N (2007) Biological network comparison using graphlet degree distribution. Bioinformatics 23(2):e177–e183

    Article  PubMed  CAS  Google Scholar 

  29. Milenkovic T, Przulj N (2008) Uncovering biological network function via graphlet degree signatures. Cancer Inform 6:CIN.S680

    Article  Google Scholar 

  30. Mangan S, Alon U (2003) Structure and function of the feed-forward loop network motif. Proc Nat Acad Sci 100(21):11980–11985

    Article  PubMed  CAS  Google Scholar 

  31. Lemons NW, Hu B, Hlavacek WS (2011) Hierarchical graphs for rule-based modeling of biochemical systems. BMC Bioinform 12(1):45

    Article  Google Scholar 

  32. Micale G, Pulvirenti A, Giugno R, Ferro A (2014) GASOLINE: a greedy and stochastic algorithm for optimal local multiple alignment of interaction networks. PLoS One 9(6):1–15

    Article  CAS  Google Scholar 

  33. Micale G, Continella A, Ferro A, Giugno R, Pulvirenti A (2014) GASOLINE: a cytoscape app for multiple local alignment of PPI networks [version 2; referees: 2 approved, 1 approved with reservations]. F1000Research 3:140

    Article  PubMed  PubMed Central  Google Scholar 

  34. Micale G, Pulvirenti A, Giugno R, Ferro A (2014) Proteins comparison through probabilistic optimal structure local alignment. Front Genet 5:302

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Micale G, Ferro A, Pulvirenti A, Giugno R (2015) SPECTRA: an integrated knowledge base for comparing tissue and tumor-specific PPI networks in human. Front Bioeng Biotechnol 3:58

    Article  PubMed  PubMed Central  Google Scholar 

  36. Bonnici V, Giugno R (2017) On the variable ordering in subgraph isomorphism algorithms. IEEE/ACM Trans Comput Biol Bioinform 14(1):193–203

    Article  PubMed  Google Scholar 

  37. Michael RG, David SJ (1979) Computers and intractability: a guide to the theory of NP-completeness. WH Free. Co., San Francisco, pp 90–91

    Google Scholar 

  38. Giugno R, Bonnici V, Bombieri N, Pulvirenti A, Ferro A, Shasha D (2013) GRAPES: a software for parallel searching on biological graphs targeting multi-core architectures. PLoS One 8(10):e76911

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  39. Bonnici V, Busato F, Micale G, Bombieri N, Pulvirenti A, Giugno R (2016) APPAGATO: an approximate parallel and stochastic graph querying tool for biological networks. Bioinformatics 32(14):2159–2166

    Article  PubMed  CAS  Google Scholar 

  40. Alon N, Yuster R, Zwick U (1995) Color-coding. J ACM (JACM) 42(4):844–856

    Article  Google Scholar 

  41. Kratsch S, Schweitzer P (2012) Isomorphism for graphs of bounded feedback vertex set number. In: Kaplan H (ed) Algorithm theory–SWAT 2010. Springer, Berlin, pp 81–92

  42. Lee J, Han W S, Kasperovics R, Lee J H (2012) An in-depth comparison of subgraph isomorphism algorithms in graph databases. In: Proceedings of the VLDB endowment, vol 6. VLDB Endowment, pp 133–144

  43. Cordella LP, Foggia P, Sansone C, Vento M (2004) A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell 26(10):1367–1372

    Article  PubMed  Google Scholar 

  44. Ullmann JR (2011) Bit-vector algorithms for binary constraint satisfaction and subgraph isomorphism. J Exp Algorithm 15:1–64

    Google Scholar 

  45. Bonnici V, Giugno R, Pulvirenti A, Shasha D, Ferro A (2013) A subgraph isomorphism algorithm and its application to biochemical data. BMC Bioinform 14(Suppl 7):S13

    Article  Google Scholar 

  46. Carletti V, Foggia P, Saggese A, Vento M (2017) Challenging the time complexity of exact subgraph isomorphism for huge and dense graphs with VF3. IEEE Trans Pattern Anal Mach Intell PP(99):1–1

    Google Scholar 

  47. McGregor JJ (1979) Relational consistency algorithms and their application in finding subgraph and graph isomorphisms. Inf Sci 19(3):229–250

    Article  Google Scholar 

  48. Solnon C (2010) Alldifferent-based filtering for subgraph isomorphism. Artif Intell 174(12):850–864

    Article  Google Scholar 

  49. Haralick RM, Elliott GL (1980) Increasing tree search efficiency for constraint satisfaction problems. Artif Intell 14(3):263–313

    Article  Google Scholar 

  50. Erdos P, Rényi A (1959) On random graphs I. Publ Math Debr 6:290–297

    Google Scholar 

  51. Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

    Article  PubMed  Google Scholar 

  52. Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, pp 177–187

  53. Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, Lin J, Minguez P, Bork P, Von Mering C et al (2012) STRING v9. 1: Protein–protein interaction networks, with increased coverage and integration. Nucleic Acids Res 41(D1):D808–D815

    Article  PubMed  PubMed Central  CAS  Google Scholar 

Download references

Acknowledgements

We would like to thank the authors of algorithm VF3 to have provided the software and for all your support on testing it. R. G., V. B., and A. A. would like to acknowledge the support of the Italian Ministry of Education, Universities and Research (MIUR) “Dipartimenti di Eccellenza 2018-2022”, and Regione del Veneto, Regione del Veneto (IT) (Grants 2016-JPVR16FNCL, 2017-B33C17000440003, GNCS-INDAM). A. P., G. M., and A. F. acknowledge the support of Italian Ministry of Education, Universities and Research (Ministero dell’Istruzione, dell’Università e della Ricerca, MIUR) Grant SCN_00451. D. S. would like to acknowledge the support of the U.S. National Science Foundation Grants MCB-1158273, IOS-1339362, and MCB-1412232.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rosalba Giugno.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aparo, A., Bonnici, V., Micale, G. et al. Fast Subgraph Matching Strategies Based on Pattern-Only Heuristics. Interdiscip Sci Comput Life Sci 11, 21–32 (2019). https://doi.org/10.1007/s12539-019-00323-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-019-00323-0

Keywords

Navigation