当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GrAFCI + A fast generator-based algorithm for mining frequent closed itemsets
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2021-05-18 , DOI: 10.1007/s10115-021-01575-3
Makhlouf Ledmi , Samir Zidat , Aboubekeur Hamdi-Cherif

Mining itemsets for association rule generation is a fundamental data mining task originally stemming from the traditional market basket analysis problem. However, enumerating all frequent itemsets, especially in a dense dataset, or with low support thresholds, remains costly. In this paper, a novel theorem builds the relationship between frequent closed itemsets (FCIs) and frequent generator itemsets (FGIs) and proves that the process of mining FCIs is equivalent to mining FGIs, unified with their full-support and extension items. On the basis of this theorem, a generator-based algorithm for mining FCIs, called GrAFCI+, is proposed and explained in details including its correctness. The comparative effectiveness of the algorithm in terms of scalability is first investigated, along with the compression rate—a measure of the interestingness of a given FIs representation. Extensive experiments are further undertaken on eight datasets and four state-of-the-art algorithms, namely DCI_CLOSED*, DCI_PLUS, FPClose, and NAFCP. The results show that the proposed algorithm is more efficient regarding the execution time in most cases as compared to these algorithms. Because GrAFCI+ main goal is to address the runtime issue, it paid a memory cost, especially when the support is too small. However, this cost is not high since GrAFCI+ is seconded by only one competitor out of four in memory utilization and for large support values. As an overall assessment, GrAFCI+ gives better results than most of its competitors.



中文翻译:

GrAFCI +一种基于快速生成器的算法,用于挖掘频繁关闭的项目集

挖掘用于关联规则生成的项目集是一项基本的数据挖掘任务,该任务最初源自传统的市场篮子分析问题。但是,枚举所有频繁项集,尤其是在密集数据集中或支持阈值较低的情况下,仍然很昂贵。在本文中,一个新颖的定理建立了频繁关闭项集(FCI)和频繁生成器项集(FGI)之间的关系,并证明了挖掘FCI的过程等同于挖掘FGI,并与它们的全支持项和扩展项相统一。根据该定理,一种用于生成FCI的基于生成器的算法称为GrAFCI +提出并详细解释了它的正确性。首先研究该算法在可伸缩性方面的相对有效性,以及压缩率,这是对给定FI表示的兴趣度的一种度量。在八个数据集和四个最新算法(DCI_CLOSED *,DCI_PLUS,FPClose和NAFCP)上进行了广泛的实验。结果表明,与这些算法相比,在大多数情况下,所提算法在执行时间上更为有效。因为GrAFCI +的主要目标是解决运行时问题,所以它付出了内存成本,尤其是在支持量太小时的情况下。但是,由于GrAFCI +仅四分之一的竞争者就内存利用率和较大的支持价值获得了支持。总体而言,GrAFCI +比大多数竞争对手提供更好的结果。

更新日期:2021-05-18
down
wechat
bug