Approximate Mining of Frequent -Subgraph Patterns in Evolving Graphs,ACM Transactions on Knowledge Discovery from Data

当前位置： X-MOL 学术 › ACM Trans. Knowl. Discov. Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Approximate Mining of Frequent -Subgraph Patterns in Evolving Graphs
ACM Transactions on Knowledge Discovery from Data ( IF 4.0 ) Pub Date : 2021-04-21 , DOI: 10.1145/3442590
Muhammad Anis Uddin Nasir ₁ , Cigdem Aslay ₂ , Gianmarco De Francisci Morales ₃ , Matteo Riondato ₄

Affiliation

“Perhaps he could dance first and think afterwards, if it isn’t too much to ask him.” S. Beckett, Waiting for Godot Given a labeled graph, the collection of

-vertex induced connected subgraph patterns that appear in the graph more frequently than a user-specified minimum threshold provides a compact summary of the characteristics of the graph, and finds applications ranging from biology to network science. However, finding these patterns is challenging, even more so for dynamic graphs that evolve over time, due to the streaming nature of the input and the exponential time complexity of the problem. We study this task in both incremental and fully-dynamic streaming settings, where arbitrary edges can be added or removed from the graph. We present TipTap , a suite of algorithms to compute high-quality approximations of the frequent

-vertex subgraphs w.r.t. a given threshold, at any time (i.e., point of the stream), with high probability. In contrast to existing state-of-the-art solutions that require iterating over the entire set of subgraphs in the vicinity of the updated edge, TipTap operates by efficiently maintaining a uniform sample of connected

-vertex subgraphs, thanks to an optimized neighborhood-exploration procedure. We provide a theoretical analysis of the proposed algorithms in terms of their unbiasedness and of the sample size needed to obtain a desired approximation quality. Our analysis relies on sample-complexity bounds that use Vapnik–Chervonenkis dimension, a key concept from statistical learning theory, which allows us to derive a sufficient sample size that is independent from the size of the graph. The results of our empirical evaluation demonstrates that TipTap returns high-quality results more efficiently and accurately than existing baselines.

中文翻译：

演化图中频繁子图模式的近似挖掘

“也许他可以先跳舞，然后再想，如果问他不是太多的话。”S.贝克特，等待戈多给定一个标记图，集合

-顶点诱导的连接子图模式比用户指定的最小阈值更频繁地出现在图中，提供了图特征的简洁总结，并发现了从生物学到网络科学的应用。然而，由于输入的流性质和问题的指数时间复杂度，找到这些模式具有挑战性，对于随时间演变的动态图更是如此。我们在增量和全动态流设置中研究此任务，其中可以从图中添加或删除任意边。我们提出提示点击，一套算法来计算频繁的高质量近似值

-顶点子图在任何时间（即流的点）具有给定阈值，概率很高。与需要迭代更新边缘附近的整个子图集的现有最先进解决方案相比，提示点击通过有效地保持连接的统一样本来运行

-顶点子图，这要归功于优化的邻域探索程序。我们根据所提出的算法的无偏性和获得所需近似质量所需的样本量对所提出的算法进行了理论分析。我们的分析依赖于使用 Vapnik-Chervonenkis 维度的样本复杂度边界，这是统计学习理论的一个关键概念，它使我们能够推导出独立于图形大小的足够样本大小。我们的实证评估结果表明提示点击比现有基线更有效、更准确地返回高质量结果。

更新日期：2021-04-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11