Counting frequent patterns in large labeled graphs: a hypergraph-based approach,Data Mining and Knowledge Discovery

当前位置： X-MOL 学术 › Data Min. Knowl. Discov. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Counting frequent patterns in large labeled graphs: a hypergraph-based approach
Data Mining and Knowledge Discovery ( IF 4.8 ) Pub Date : 2020-05-05 , DOI: 10.1007/s10618-020-00686-9
Jinghan Meng , Napath Pitaksirianan , Yi-Cheng Tu

In recent years, the popularity of graph databases has grown rapidly. This paper focuses on single-graph as an effective model to represent information and its related graph mining techniques. In frequent pattern mining in a single-graph setting, there are two main problems: support measure and search scheme. In this paper, we propose a novel framework for designing support measures that brings together existing minimum-image-based and overlap-graph-based support measures. Our framework is built on the concept of occurrence/instance hypergraphs. Based on such, we are able to design a series of new support measures: minimum instance (MI) measure, and minimum vertex cover (MVC) measure, that combine the advantages of existing measures. More importantly, we show that the existing minimum-image-based support measure is an upper bound of the MI measure, which is also linear-time computable and results in counts that are close to number of instances of a pattern. We show that not only most major existing support measures and new measures proposed in this paper can be mapped into the new framework, but also they occupy different locations of the frequency spectrum. By taking advantage of the new framework, we discover that MVC can be approximated to a constant factor (in terms of number of pattern nodes) in polynomial time. In contrast to common belief, we demonstrate that the state-of-the-art overlap-graph-based maximum independent set (MIS) measure also has constant approximation algorithms. We further show that using standard linear programming and semidefinite programming techniques, polynomial-time relaxations for both MVC and MIS measures can be developed and their counts stand between MVC and MIS. In addition, we point out that MVC, MIS, and their relaxations are bounded within constant factor. In summary, all major support measures are unified in the new hypergraph-based framework which helps reveal their bounding relations and hardness properties.

中文翻译：

计数大标签图中的频繁模式：基于超图的方法

近年来，图形数据库的普及迅速增长。本文着重于将单图作为表示信息的有效模型及其相关的图挖掘技术。在单图环境中的频繁模式挖掘中，存在两个主要问题：支持措施和搜索方案。在本文中，我们提出了一个用于设计支持措施的新颖框架，该框架将现有的基于最小图像和基于重叠图的支持措施结合在一起。我们的框架建立在事件/实例超图的概念上。基于此，我们能够设计一系列新的支持措施：最小实例（MI）措施和最小顶点覆盖（MVC）措施，这些措施结合了现有措施的优势。更重要的是，我们表明现有的基于最小图像的支持措施是MI措施的上限，它也是线性时间可计算的，其计数接近模式实例的数量。我们表明，不仅本文中提出的大多数主要的现有支持措施和新措施都可以映射到新框架中，而且它们也占据了频谱的不同位置。通过利用新框架，我们发现MVC可以在多项式时间内近似为一个常数因子（根据模式节点的数量）。与普遍的看法相反，我们证明了基于最新重叠图的最大独立集（MIS）度量也具有恒定近似算法。我们进一步证明，使用标准线性规划和半定规划技术，可以开发MVC和MIS度量的多项式时间弛豫，并且它们的计数介于MVC和MIS之间。另外，我们指出，MVC，MIS及其松弛度都在恒定因子之内。总而言之，所有主要支持措施都统一在基于超图的新框架中，这有助于揭示它们的边界关系和硬度属性。

更新日期：2020-05-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>