当前位置: X-MOL 学术Int. J. Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SAT‐based and CP‐based declarative approaches for Top‐Rank‐ K  closed frequent itemset mining
International Journal of Intelligent Systems ( IF 7 ) Pub Date : 2020-10-16 , DOI: 10.1002/int.22294
Sa'ed Abed 1 , Areej A. Abdelaal 1 , Mohammad H. Al‐Shayeji 1 , Imtiaz Ahmad 1
Affiliation  

Top‐Rank‐K Frequent Itemset (or Pattern) Mining (FPM) is an important data mining task, where user decides on the number of top frequency ranks of patterns (itemsets) they want to mine from a transactional dataset. This problem does not require the minimum support threshold parameter that is typically used in FPM problems. Rather, the algorithms solving the Top‐Rank‐K FPM problem are fed with K , the number of frequency ranks of itemsets required, to compute the threshold internally. This paper presents two declarative approaches to tackle the Top‐Rank‐K Closed FPM problem. The first approach is Boolean Satisfiability‐based (SAT‐based) where we propose an effective encoding for the problem along with an efficient algorithm employing this encoding. The second approach is CP‐based, that is, utilizes Constraint Programming technique, where a simple CP model is exploited in an innovative manner to mine the Top‐Rank‐K Closed FPM itemsets from transactional datasets. Both approaches are evaluated experimentally against other declarative and imperative algorithms. The proposed SAT‐based approach significantly outperforms IM, another SAT‐based approach, and outperforms the proposed CP‐approach for sparse and moderate datasets, whereas the latter excels on dense datasets. An extensive study has been conducted to assess the proposed approaches in terms of their feasibility, performance factors, and practicality of use.

中文翻译:

用于 Top-Rank-K 封闭频繁项集挖掘的基于 SAT 和基于 CP 的声明方法

Top-Rank-K 频繁项集(或模式)挖掘(FPM)是一项重要的数据挖掘任务,其中用户决定他们想要从事务数据集中挖掘的模式(项集)的最高频率等级的数量。此问题不需要 FPM 问题中通常使用的最小支持阈值参数。相反,解决 Top-Rank-K FPM 问题的算法需要输入 K,即所需的项集的频率等级数,以在内部计算阈值。本文提出了两种声明性方法来解决 Top-Rank-K 封闭 FPM 问题。第一种方法是基于布尔可满足性(SAT-based)的方法,我们为问题提出了一种有效的编码以及采用这种编码的有效算法。第二种方法是基于 CP 的,即利用约束编程技术,其中以创新的方式利用简单的 CP 模型从事务数据集中挖掘 Top-Rank-K 封闭 FPM 项集。这两种方法都针对其他声明式和命令式算法进行了实验评估。提出的基于 SAT 的方法显着优于另一种基于 SAT 的方法 IM,并且在稀疏和中等数据集上优于提出的 CP 方法,而后者在密集数据集上表现出色。已经进行了广泛的研究,以评估所提出的方法的可行性、性能因素和使用实用性。提出的基于 SAT 的方法显着优于另一种基于 SAT 的方法 IM,并且在稀疏和中等数据集上优于提出的 CP 方法,而后者在密集数据集上表现出色。已经进行了广泛的研究,以评估所提出的方法的可行性、性能因素和使用实用性。提出的基于 SAT 的方法显着优于另一种基于 SAT 的方法 IM,并且在稀疏和中等数据集上优于提出的 CP 方法,而后者在密集数据集上表现出色。已经进行了广泛的研究,以评估所提出的方法的可行性、性能因素和使用实用性。
更新日期:2020-10-16
down
wechat
bug