当前位置: X-MOL 学术Data Knowl. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Discovering closed and maximal embedded patterns from large tree data
Data & Knowledge Engineering ( IF 2.7 ) Pub Date : 2021-04-19 , DOI: 10.1016/j.datak.2021.101890
Xiaoying Wu , Dimitri Theodoratos , Nikos Mamoulis

Many current applications and systems produce large tree datasets and export, exchange, and represent data in tree-structured form. Extracting informative patterns from large data trees is an important research direction with multiple applications in practice. Pattern mining research initially focused on mining induced patterns and gradually evolved into mining embedded patterns. A well-known problem of frequent pattern mining is the huge number of patterns it produces. This affects not only the efficiency but also the effectiveness of mining. A typical solution to this problem is to summarize frequent patterns through closed and maximal patterns. No previous work addresses the problem of mining closed and/or maximal embedded tree patterns, not even in the framework of mining multiple small trees.

We address the problem of summarizing embedded tree patterns extracted from large data trees, by defining and mining closed and maximal embedded unordered tree patterns. We design an embedded frequent pattern mining algorithm extended with a local closedness checking technique. This algorithm is called closedEmbTM-eager as it eagerly eliminates non-closed patterns. To mitigate the generation of intermediate patterns, we devise pattern search space pruning rules to proactively detect and prune branches in the pattern search space which do not correspond to closed patterns. The pruning rules are accommodated into the extended embedded pattern miner to produce a new algorithm, called closedEmbTM-prune, for mining all the closed and maximal embedded frequent patterns. Our extensive experiments on synthetic and real large-tree datasets demonstrate that, on dense datasets, closedEmbTM-prune not only generates a complete closed and maximal pattern set which is substantially smaller than that generated by the embedded pattern miner, but also runs much faster with negligible overhead on pattern pruning.



中文翻译:

从大树数据中发现闭合和最大嵌入模式

当前许多应用程序和系统都会生成大型树数据集,并以树结构形式导出,交换和表示数据。从大型数据树中提取信息模式是在实践中具有多种应用的重要研究方向。模式挖掘研究最初专注于挖掘诱导模式,然后逐渐发展为挖掘嵌入式模式。频繁进行模式挖掘的一个众所周知的问题是它会产生大量的模式。这不仅影响开采效率,而且影响开采效率。解决此问题的一种典型方法是通过闭合和最大模式总结频繁的模式。没有以前的工作解决挖掘封闭和/或最大嵌入树模式的问题,即使在挖掘多个小树的框架中也是如此。

我们通过定义和挖掘封闭和最大的嵌入式无序树模式来解决总结从大数据树中提取的嵌入式树模式的问题。我们设计了一种扩展的嵌入式频繁模式挖掘算法,该算法扩展了局部封闭性检查技术。该算法被称为closedEmbTM-eager,因为它会急切地消除非闭合模式。为了减轻中间模式的生成,我们设计了模式搜索空间修剪规则,以主动检测并修剪模式搜索空间中不对应于封闭模式的分支。修剪规则被容纳到扩展的嵌入式模式挖掘器中,以产生一种称为closedEmbTM-prune的新算法。,用于挖掘所有闭合和最大嵌入的频繁模式。我们在合成和真实大树数据集上进行的广泛实验表明,在密集数据集上,closedEmbTM-prune不仅会生成一个完整的封闭最大模式集,该模式集比嵌入式模式挖掘器生成的模式集要小得多,而且运行速度要快得多。模式修剪的开销可忽略不计。

更新日期:2021-04-27
down
wechat
bug