当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Online summarization of dynamic graphs using subjective interestingness for sequential data
Data Mining and Knowledge Discovery ( IF 2.8 ) Pub Date : 2020-09-09 , DOI: 10.1007/s10618-020-00714-8
Sarang Kapoor , Dhish Kumar Saxena , Matthijs van Leeuwen

Many real-world phenomena can be represented as dynamic graphs, i.e., networks that change over time. The problem of dynamic graph summarization, i.e., to succinctly describe the evolution of a dynamic graph, has been widely studied. Existing methods typically use objective measures to find fixed structures such as cliques, stars, and cores. Most of the methods, however, do not consider the problem of online summarization, where the summary is incrementally conveyed to the analyst as the graph evolves, and (thus) do not take into account the knowledge of the analyst at a specific moment in time. We address this gap in the literature through a novel, generic framework for subjective interestingness for sequential data. Specifically, we iteratively identify atomic changes, called ‘actions’, that provide most information relative to the current knowledge of the analyst. For this, we introduce a novel information gain measure, which is motivated by the minimum description length (MDL) principle. With this measure, our approach discovers compact summaries without having to decide on the number of patterns. As such, we are the first to combine approaches for data mining based on subjective interestingness (using the maximum entropy principle) with pattern-based summarization (using the MDL principle). We instantiate this framework for dynamic graphs and dense subgraph patterns, and present DSSG, a heuristic algorithm for the online summarization of dynamic graphs by means of informative actions, each of which represents an interpretable change to the connectivity structure of the graph. The experiments on real-world data demonstrate that our approach effectively discovers informative summaries. We conclude with a case study on data from an airline network to show its potential for real-world applications.



中文翻译:

使用主观兴趣对顺序数据进行动态图的在线汇总

许多现实世界的现象可以表示为动态图,即随时间变化的网络。动态图概述的问题,即简洁地描述动态图的演变,已经被广泛研究。现有的方法通常使用客观的度量来找到固定的结构,例如集团,恒星和核心。但是,大多数方法都没有考虑在线问题总结,即随着图表的发展将摘要逐渐传递给分析人员,并且(因此)未考虑特定时间的分析人员的知识。我们通过新颖,通用的框架来解决文学作品中的这一空白,以获取顺序数据的主观兴趣。具体而言,我们迭代地确定称为“动作”的原子变化,这些变化提供了与分析师当前知识有关的大多数信息。为此,我们介绍了一种新颖的信息获取方法测量,这是由最小描述长度(MDL)原理推动的。通过这种措施,我们的方法可以发现紧凑的摘要,而不必决定模式的数量。因此,我们是第一个将基于主观兴趣(使用最大熵原理)的数据挖掘方法与基于模式的摘要(使用MDL原理)相结合的方法。我们为动态图和密集子图模式实例化了此框架,并提出了DSSG,一种启发式算法,用于通过信息性操作对动态图进行在线汇总,每个动作都表示对图的连通性结构的可解释的变化。对现实世界数据的实验表明,我们的方法有效地发现了信息摘要。

更新日期:2020-09-10
down
wechat
bug