当前位置: X-MOL 学术J. Intell. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PARASOL: a hybrid approximation approach for scalable frequent itemset mining in streaming data
Journal of Intelligent Information Systems ( IF 3.4 ) Pub Date : 2019-12-17 , DOI: 10.1007/s10844-019-00590-9
Yoshitaka Yamamoto , Yasuo Tabei , Koji Iwanuma

Here, we present a novel algorithm for frequent itemset mining in streaming data (FIM-SD). For the past decade, various FIM-SD methods in one-pass approximation settings that allow to approximate the support of each itemset have been proposed. They can be categorized into two approximation types: parameter-constrained (PC) mining and resource-constrained (RC) mining. PC methods control the maximum error that can be included in the approximate support based on a pre-defined parameter. In contrast, RC methods limit the maximum memory consumption based on resource constraints. However, the existing PC methods can exponentially increase the memory consumption, while the existing RC methods can rapidly increase the maximum error. In this study, we address this problem by introducing a hybrid approach of PC-RC approximations, called PARASOL . For any streaming data, PARASOL ensures to provide a condensed representation, called a Δ-covered set , which is regarded as an extension of the closedness compression; when Δ = 0, the solution corresponds to the ordinary closed itemsets. PARASOL searches for such approximate closed itemsets that can restore the frequent itemsets and their supports while the maximum error is bounded by an integer, Δ. Then, we empirically demonstrate that the proposed algorithm significantly outperforms the state-of-the-art PC and RC methods for FIM-SD.

中文翻译:

PARASOL:流数据中可扩展频繁项集挖掘的混合逼近方法

在这里,我们提出了一种新的流数据频繁项集挖掘算法(FIM-SD)。在过去的十年中,已经提出了单程近似设置中的各种 FIM-SD 方法,这些方法允许近似每个项目集的支持。它们可以分为两种近似类型:参数约束(PC)挖掘和资源约束(RC)挖掘。PC 方法基于预定义的参数控制可包含在近似支持中的最大误差。相比之下,RC 方法根据资源限制限制最大内存消耗。然而,现有的PC方法可以成倍地增加内存消耗,而现有的RC方法可以迅速增加最大误差。在这项研究中,我们通过引入一种称为 PARASOL 的 PC-RC 近似混合方法来解决这个问题。对于任何流数据,PARASOL 确保提供一个压缩表示,称为 Δ-covered set ,它被视为封闭性压缩的扩展;当 Δ = 0 时,解对应于普通闭项集。PARASOL 搜索这样的近似闭项集,它们可以恢复频繁项集及其支持度,同时最大误差以整数 Δ 为界。然后,我们凭经验证明所提出的算法明显优于 FIM-SD 的最先进的 PC 和 RC 方法。PARASOL 搜索这样的近似闭项集,它们可以恢复频繁项集及其支持度,同时最大误差以整数 Δ 为界。然后,我们凭经验证明所提出的算法明显优于 FIM-SD 的最先进的 PC 和 RC 方法。PARASOL 搜索这样的近似闭项集,它们可以恢复频繁项集及其支持度,同时最大误差以整数 Δ 为界。然后,我们凭经验证明所提出的算法明显优于 FIM-SD 的最先进的 PC 和 RC 方法。
更新日期:2019-12-17
down
wechat
bug