Abstract
Many scientific applications entail solving the subgraph isomorphism problem, i.e., given an input pattern graph, find all the subgraphs of a (usually much larger) target graph that are structurally equivalent to that input. Because subgraph isomorphism is NP-complete, methods to solve it have to use heuristics. This work evaluates subgraph isomorphism methods to assess their computational behavior on a wide range of synthetic and real graphs. Surprisingly, our experiments show that, among the leading algorithms, certain heuristics based only on pattern graphs are the most efficient.
Similar content being viewed by others
References
Mashaghi AR, Ramezanpour A, Karimipour V (2004) Investigation of a protein complex network. Eur Phys J B Condens Matter Complex Syst 41(1):113–121
Li S, Armstrong CM, Bertin N, Ge H, Milstein S et al (2004) A map of the interactome network of the Metazoan C. elegans. Science 303(5657):540–543
Faccioli P, Provero P, Herrmann C, Stanca AM, Morcia C, Terzi V (2005) From single genes to co-expression networks: Extracting knowledge from barley functional genomics. Plant Mol Biol 58(5):739–750
Gerstein M B, Kundaje A, Hariharan M, Landt S G, Yan KK, Cheng C, Mu et al (2012) Architecture of the human regulatory network derived from ENCODE data. Nature 489(7414):91–100
McCall MN (2013) Estimation of gene regulatory networks. J Postdr Res 1(1):60–69
Christensen C, Thakar J, Albert R (2007) Systems-level insights into cellular regulation: inferring, analysing, and modelling intracellular networks. IET Syst Biol 1(2):61–77
Terzer M, Maynard ND, Covert MW, Stelling J (2009) Genome-scale metabolic networks. Wiley Interdiscip Rev Syst Biol Med 1(3):285–297
Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabási A-L (2002) Hierarchical organization of modularity in metabolic networks. Science 297(5586):1551–1555
Redestig H, Szymanski J, Hirai MY, Selbig J, Willmitzer L, Nikoloski Z, Saito K (2018) Data integration, metabolic networks and systems biology, chapter 9. American Cancer Society, Atlanta, pp 261–316
Janjic V, Przulj N (2012) Biological function through network topology: a survey of the human diseasome. Brief Funct Genom 11(6):522–532
Goh KI, Choi IG (2012) Exploring the human diseasome: the human disease network. Brief Funct Genom 11(6):533–542
Wysocki K, Ritter L (2011) Diseasome: an approach to understanding gene-disease interactions. Annu Rev Nurs Res 29:55–72
Suvarna Vani K, Praveen Kumar K (2018) Feature Extraction of protein contact maps from protein 3D-coordinates. In: Mishra D K, Azar A T, Joshi A (eds) Information and communication technology. Springer, Singapore, pp 311–320
Hu J, Shen X, Shao Y, Bystroff C, Zaki M J (2002) Mining protein contact maps. In: Proceedings of the 2Nd international conference on data mining in bioinformatics, BIOKDD’02, London, UK. Springer, pp 3–10
Bader GD, Cary MP, Sander C (2006) Pathguide: a pathway resource list. Nucleic Acids Res 34(suppl1):D504–D506
Cerami EG, Gross BE, Demir E, Rodchenkov I, Babur A, Anwar N, Schultz N, Bader GD, Sander C (2011) Pathway commons, a web resource for biological pathway data. Nucleic Acids Res 39(suppl1):D685–D690
Chatr-aryamontri A, Oughtred R, Boucher L and J. et al (2017) Rust. The BioGRID interaction database: 2017 update. Nucleic Acids Res 45(D1):d369–d379. Exported from https://app.dimensions.aion2018/08/18
Bonnici V, Russo F, Bombieri N, Pulvirenti A, Giugno R (2014) Comprehensive reconstruction and visualization of non-coding regulatory networks in human. Front Bioeng Biotechnol 2:69
Turkarslan S, Wurtmann EJ, Wu WJ, Jiang N et al (2014) Network portal: a database for storage, analysis and visualization of biological networks. Nucleic Acids Res 42(D1):D184–D190
Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell’s functional organization. Nat Rev Genet 5(2):101–113
Yu D, Kim M, Xiao G, Hwang TH (2013) Review of biological network data and its applications. Genom Inform 11(4):200–210
Csermely P, Korcsmaros T, Kiss HJ, London G, Nussinov R (2013) Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review. Pharmacol Ther 138(3):333–408
Barabasi AL, Gulbahce N, Loscalzo J (2011) Network medicine: a network-based approach to human disease. Nat Rev Genet 12(1):56–68
Giuliani A, Filippi S, Bertolaso M (2014) Why network approach can promote a new way of thinking in biology. Front Genet 5:83
Micale G, Giugno R, Ferro A, Mongiovì M, Shasha D, Pulvirenti A (2018) Fast analytical methods for finding significant labeled graph motifs. Data Min Knowl Discov 32(2):504–531
Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827
Palsson B, Zengler K (2010) The challenges of integrating multi-omic data sets. Nat Chem Biol 6:787
Przulj N (2007) Biological network comparison using graphlet degree distribution. Bioinformatics 23(2):e177–e183
Milenkovic T, Przulj N (2008) Uncovering biological network function via graphlet degree signatures. Cancer Inform 6:CIN.S680
Mangan S, Alon U (2003) Structure and function of the feed-forward loop network motif. Proc Nat Acad Sci 100(21):11980–11985
Lemons NW, Hu B, Hlavacek WS (2011) Hierarchical graphs for rule-based modeling of biochemical systems. BMC Bioinform 12(1):45
Micale G, Pulvirenti A, Giugno R, Ferro A (2014) GASOLINE: a greedy and stochastic algorithm for optimal local multiple alignment of interaction networks. PLoS One 9(6):1–15
Micale G, Continella A, Ferro A, Giugno R, Pulvirenti A (2014) GASOLINE: a cytoscape app for multiple local alignment of PPI networks [version 2; referees: 2 approved, 1 approved with reservations]. F1000Research 3:140
Micale G, Pulvirenti A, Giugno R, Ferro A (2014) Proteins comparison through probabilistic optimal structure local alignment. Front Genet 5:302
Micale G, Ferro A, Pulvirenti A, Giugno R (2015) SPECTRA: an integrated knowledge base for comparing tissue and tumor-specific PPI networks in human. Front Bioeng Biotechnol 3:58
Bonnici V, Giugno R (2017) On the variable ordering in subgraph isomorphism algorithms. IEEE/ACM Trans Comput Biol Bioinform 14(1):193–203
Michael RG, David SJ (1979) Computers and intractability: a guide to the theory of NP-completeness. WH Free. Co., San Francisco, pp 90–91
Giugno R, Bonnici V, Bombieri N, Pulvirenti A, Ferro A, Shasha D (2013) GRAPES: a software for parallel searching on biological graphs targeting multi-core architectures. PLoS One 8(10):e76911
Bonnici V, Busato F, Micale G, Bombieri N, Pulvirenti A, Giugno R (2016) APPAGATO: an approximate parallel and stochastic graph querying tool for biological networks. Bioinformatics 32(14):2159–2166
Alon N, Yuster R, Zwick U (1995) Color-coding. J ACM (JACM) 42(4):844–856
Kratsch S, Schweitzer P (2012) Isomorphism for graphs of bounded feedback vertex set number. In: Kaplan H (ed) Algorithm theory–SWAT 2010. Springer, Berlin, pp 81–92
Lee J, Han W S, Kasperovics R, Lee J H (2012) An in-depth comparison of subgraph isomorphism algorithms in graph databases. In: Proceedings of the VLDB endowment, vol 6. VLDB Endowment, pp 133–144
Cordella LP, Foggia P, Sansone C, Vento M (2004) A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell 26(10):1367–1372
Ullmann JR (2011) Bit-vector algorithms for binary constraint satisfaction and subgraph isomorphism. J Exp Algorithm 15:1–64
Bonnici V, Giugno R, Pulvirenti A, Shasha D, Ferro A (2013) A subgraph isomorphism algorithm and its application to biochemical data. BMC Bioinform 14(Suppl 7):S13
Carletti V, Foggia P, Saggese A, Vento M (2017) Challenging the time complexity of exact subgraph isomorphism for huge and dense graphs with VF3. IEEE Trans Pattern Anal Mach Intell PP(99):1–1
McGregor JJ (1979) Relational consistency algorithms and their application in finding subgraph and graph isomorphisms. Inf Sci 19(3):229–250
Solnon C (2010) Alldifferent-based filtering for subgraph isomorphism. Artif Intell 174(12):850–864
Haralick RM, Elliott GL (1980) Increasing tree search efficiency for constraint satisfaction problems. Artif Intell 14(3):263–313
Erdos P, Rényi A (1959) On random graphs I. Publ Math Debr 6:290–297
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, pp 177–187
Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, Lin J, Minguez P, Bork P, Von Mering C et al (2012) STRING v9. 1: Protein–protein interaction networks, with increased coverage and integration. Nucleic Acids Res 41(D1):D808–D815
Acknowledgements
We would like to thank the authors of algorithm VF3 to have provided the software and for all your support on testing it. R. G., V. B., and A. A. would like to acknowledge the support of the Italian Ministry of Education, Universities and Research (MIUR) “Dipartimenti di Eccellenza 2018-2022”, and Regione del Veneto, Regione del Veneto (IT) (Grants 2016-JPVR16FNCL, 2017-B33C17000440003, GNCS-INDAM). A. P., G. M., and A. F. acknowledge the support of Italian Ministry of Education, Universities and Research (Ministero dell’Istruzione, dell’Università e della Ricerca, MIUR) Grant SCN_00451. D. S. would like to acknowledge the support of the U.S. National Science Foundation Grants MCB-1158273, IOS-1339362, and MCB-1412232.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Aparo, A., Bonnici, V., Micale, G. et al. Fast Subgraph Matching Strategies Based on Pattern-Only Heuristics. Interdiscip Sci Comput Life Sci 11, 21–32 (2019). https://doi.org/10.1007/s12539-019-00323-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-019-00323-0