Counting frequent patterns in large labeled graphs: a hypergraph-based approach

Meng, Jinghan; Pitaksirianan, Napath; Tu, Yi-Cheng

doi:10.1007/s10618-020-00686-9

Counting frequent patterns in large labeled graphs: a hypergraph-based approach

Published: 05 May 2020

Volume 34, pages 980–1021, (2020)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

522 Accesses
Explore all metrics

Abstract

In recent years, the popularity of graph databases has grown rapidly. This paper focuses on single-graph as an effective model to represent information and its related graph mining techniques. In frequent pattern mining in a single-graph setting, there are two main problems: support measure and search scheme. In this paper, we propose a novel framework for designing support measures that brings together existing minimum-image-based and overlap-graph-based support measures. Our framework is built on the concept of occurrence/instance hypergraphs. Based on such, we are able to design a series of new support measures: minimum instance (MI) measure, and minimum vertex cover (MVC) measure, that combine the advantages of existing measures. More importantly, we show that the existing minimum-image-based support measure is an upper bound of the MI measure, which is also linear-time computable and results in counts that are close to number of instances of a pattern. We show that not only most major existing support measures and new measures proposed in this paper can be mapped into the new framework, but also they occupy different locations of the frequency spectrum. By taking advantage of the new framework, we discover that MVC can be approximated to a constant factor (in terms of number of pattern nodes) in polynomial time. In contrast to common belief, we demonstrate that the state-of-the-art overlap-graph-based maximum independent set (MIS) measure also has constant approximation algorithms. We further show that using standard linear programming and semidefinite programming techniques, polynomial-time relaxations for both MVC and MIS measures can be developed and their counts stand between MVC and MIS. In addition, we point out that MVC, MIS, and their relaxations are bounded within constant factor. In summary, all major support measures are unified in the new hypergraph-based framework which helps reveal their bounding relations and hardness properties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Frequent Graph Mining Based on Multiple Minimum Support Constraints

Mining Frequent Patterns with Counting Quantifiers

LC-mine: a framework for frequent subgraph mining with local consistency techniques

Article 24 July 2014

Brahim Douar, Michel Liquiere, … Yahya Slimani

Notes

For that, we use the words frequency and support interchangeably in this paper. We also use the word support and the phrase support measure in the same way.
In this paper, following conventions of this field, computing time of support measures does not include that for constructing the framework (e.g., overlap graph in the MIS case).

References

Borgelt C, Berthold MR (2002) Mining molecular fragments: finding relevant substructures of molecules. In: Proceedings of the 2002 IEEE international conference on data mining, pp 51–58. https://doi.org/10.1109/ICDM.2002.1183885
Bringmann B, Nijssen S (2008) What is frequent in a single graph? In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 858–863
Calders T, Ramon J, Van yck D (2008) Anti-monotonic overlap-graph support measures. In: 2008 eighth IEEE international conference on data mining. IEEE, pp 73–82
Chan YH, Lau LC (2010) On linear and semidefinite programming relaxations for hypergraph matching. In: Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, pp 1500–1511
Cygan M (2013) Improved approximation for 3-dimensional matching via bounded pathwidth local search. In: 2013 IEEE 54th annual symposium on foundations of computer science (FOCS). IEEE, pp 509–518
Elseidy M, Abdelhamid E, Skiadopoulos S, Kalnis P (2014) Grami: frequent subgraph and pattern mining in a single large graph. Proc VLDB Endow 7(7):517–528
Article Google Scholar
Fiedler M, Borgelt C (2007) Support computation for mining frequent subgraphs in a single graph. In: MLG, Citeseer
Füredi Z, Kahn J, Seymour PD (1993) On the fractional matching polytope of a hypergraph. Combinatorica 13(2):167–180
Article MathSciNet Google Scholar
Holmerin J (2002) Improved inapproximability results for vertex cover on k-uniform hypergraphs. In: Proceedings of the 29th international colloquium on automata, languages and programming. Springer, London, ICALP ’02, pp 1005–1016. http://dl.acm.org/citation.cfm?id=646255.756764
Hong M, Zhou H, Wang W, Shi B (2003) An efficient algorithm of frequent connected subgraph extraction. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 40–51
Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: Third IEEE international conference on data mining, 2003. ICDM 2003. IEEE, pp 549–552
Hurkens CAJ, Schrijver A (1989) On the size of systems of sets every t of which have an sdr, with an application to the worst-case ratio of heuristics for packing problems. SIAM J Discrete Math 2(1):68–72. https://doi.org/10.1137/0402008
Article MathSciNet MATH Google Scholar
IBM (2011) IBM ILOG CPLEX optimization studio CPLEX user’s manual
Inokuchi A, Washio T, Motoda H (2003) Complete mining of frequent patterns from graphs: mining graph data. Mach Learn 50(3):321–354
Article Google Scholar
Karp RM (1972) Reducibility among combinatorial problems. In: Miller R (ed) Complexity of computer computations. Springer, New York, pp 85–103
Chapter Google Scholar
Kunegis J (2018) Konect. http://konect.uni-koblenz.de/
Kuramochi M, Karypis G (2004a) An efficient algorithm for discovering frequent subgraphs. IEEE Trans Knowl Data Eng 16(9):1038–1051
Article Google Scholar
Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Min Knowl Discov 11(3):243–271
Article MathSciNet Google Scholar
Kuramochi M, Karypis G (2004b) Grew-a scalable frequent subgraph discovery algorithm. In: Fourth IEEE international conference on data mining, 2004, ICDM’04. IEEE, pp 439–442
Lovász L (1979) On the shannon capacity of a graph. IEEE Trans Inf Theory 25(1):1–7
Article MathSciNet Google Scholar
McKay BD, Piperno A (2014) Practical graph isomorphism, II. J Symb Comput 60:94–112. https://doi.org/10.1016/j.jsc.2013.09.003
Article MathSciNet MATH Google Scholar
Meng J, Tu Yc (2017) Flexible and feasible support measures for mining frequent patterns in large labeled graphs. In: Proceedings of the 2017 ACM international conference on management of data. ACM, New York, SIGMOD ’17, pp 391–402. https://doi.org/10.1145/3035918.3035936
Pach J, Agarwal PK (2011) Combinatorial geometry, vol 37. Wiley, New York
MATH Google Scholar
Pitaksirianan N (2019) Graphmining. https://github.com/napath-pitaksirianan/GraphMining
Spielman DA, Teng SH (2004) Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time. J ACM 51(3):385–463. https://doi.org/10.1145/990308.990310
Article MathSciNet MATH Google Scholar
Talukder N, Zaki MJ (2016) A distributed approach for graph mining in massive networks. Data Min Knowl Discov 30(5):1024–1052
Article MathSciNet Google Scholar
Vanetik N, Shimony SE, Gudes E (2006) Support measures for graph data. Data Min Knowl Discov 13(2):243–260
Article MathSciNet Google Scholar
Vanetik N, Gudes E, Shimony SE (2002) Computing frequent graph patterns from semistructured data. In: Proceedings of the 2002 IEEE international conference on data mining. IEEE Computer Society, Washington, ICDM ’02, pp 458–465
Wang Y, Ramon J, Fannes T (2013) An efficiently computable subgraph pattern support measure: counting independent observations. Data Min Knowl Discov 27(3):444–477
Article MathSciNet Google Scholar
Wang Y, Ramon J (2012) An efficiently computable support measure for frequent subgraph pattern mining. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 362–377
Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE international conference on data mining (ICDM 2002), 9–12 December 2002, Maebashi City, Japan, pp 721–724. https://doi.org/10.1109/ICDM.2002.1184038
Yan X, Han J (2003) Closegraph: mining closed frequent graph patterns. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 286–295

Download references

Acknowledgements

This work is supported by a grant (IIS-1253980) from the National Science Foundation (NSF) of U.S.A. Jinghan Meng was partially supported by an award (R01GM086707) from US National Institutes of Health (NIH).

Author information

Authors and Affiliations

University of South Florida, 4202 E Fowler Ave, Tampa, FL, 33620, USA
Jinghan Meng, Napath Pitaksirianan & Yi-Cheng Tu

Authors

Jinghan Meng
View author publications
You can also search for this author in PubMed Google Scholar
Napath Pitaksirianan
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Cheng Tu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi-Cheng Tu.

Additional information

Responsible editor: M.J. Zaki

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Meng, J., Pitaksirianan, N. & Tu, YC. Counting frequent patterns in large labeled graphs: a hypergraph-based approach. Data Min Knowl Disc 34, 980–1021 (2020). https://doi.org/10.1007/s10618-020-00686-9

Download citation

Received: 22 February 2019
Accepted: 15 April 2020
Published: 05 May 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s10618-020-00686-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Counting frequent patterns in large labeled graphs: a hypergraph-based approach

Abstract

Access this article

Similar content being viewed by others

Frequent Graph Mining Based on Multiple Minimum Support Constraints

Mining Frequent Patterns with Counting Quantifiers

LC-mine: a framework for frequent subgraph mining with local consistency techniques

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Counting frequent patterns in large labeled graphs: a hypergraph-based approach

Abstract

Access this article

Similar content being viewed by others

Frequent Graph Mining Based on Multiple Minimum Support Constraints

Mining Frequent Patterns with Counting Quantifiers

LC-mine: a framework for frequent subgraph mining with local consistency techniques

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation