当前位置: X-MOL 学术Constraints › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prefix-projection global constraint and top- k approach for sequential pattern mining
Constraints ( IF 1.6 ) Pub Date : 2016-08-26 , DOI: 10.1007/s10601-016-9252-z
Amina Kemmar , Yahia Lebbah , Samir Loudni , Patrice Boizumault , Thierry Charnois

Sequential pattern mining (SPM) is an important data mining problem with broad applications. SPM is a hard problem due to the huge number of intermediate subsequences to be considered. State of the art approaches for SPM (e.g., PrefixSpan Pei et al. 2001) are largely based on the pattern-growth approach, where for each frequent prefix subsequence, only its related suffix subsequences need to be considered, and the database is recursively projected into smaller ones. Many authors have promoted the use of constraints to focus on the most promising patterns according to the interests of the end user. The top-k SPM problem is also used to cope with the difficulty of thresholding and to control the number of solutions. State of the art methods developed for SPM and top-k SPM, though efficient, are locked into a rather rigid search strategy, and suffer from the lack of declarativity and flexibility. Indeed, adding new constraints usually amounts to changing the data-structures used in the core of the algorithm, and combining these new constraints often require new developments. Recent works (e.g. Kemmar et al. 2014; Négrevergne and Guns 2015) have investigated the use of Constraint Programming (CP) for SPM. However, despite their nice declarative aspects, all these modelings have scaling problems, due to the huge size of their constraint networks. To address this issue, we propose the Prefix-Projection global constraint, which encapsulates both the subsequence relation as well as the frequency constraint. Its filtering algorithm relies on the principle of projected databases which allows to keep in the variables domain, only values leading to a frequent pattern in the database. Prefix-Projection filtering algorithm enforces domain consistency on the variable succeeding the current frequent prefix in polynomial time. This global constraint also allows for a straightforward implementation of additional constraints such as size, item membership, regular expressions and any combination of them. Experimental results show that our approach clearly outperforms existing CP approaches and competes well with the state-of-the-art methods on large datasets for mining frequent sequential patterns, sequential patterns under various constraints, and top-k sequential patterns. Unlike existing CP methods, our approach achieves a better scalability.

中文翻译:

前缀投影全局约束和top-k方法用于顺序模式挖掘

顺序模式挖掘(SPM)是一个具有广泛应用的重要数据挖掘问题。由于要考虑大量的中间子序列,因此SPM是一个难题。SPM的最新方法(例如PrefixSpan Pei等人2001)主要基于模式增长方法,对于每个频繁的前缀子序列,只需考虑其相关的后缀子序列,并递归投影数据库变成较小的。许多作者提倡使用约束来根据最终用户的兴趣专注于最有希望的模式。前k个SPM问题还用于解决阈值设置的困难并控制解决方案的数量。为SPM和top- k开发的最新方法SPM虽然有效,但是却被锁定在一个比较严格的搜索策略中,并且缺乏声明性和灵活性。确实,添加新约束通常意味着更改算法核心中使用的数据结构,并且将这些新约束组合在一起通常需要进行新的开发。最近的工作(例如,Kemmar等人,2014;Négrevergne和Guns,2015)已​​经研究了约束编程(CP)在SPM中的使用。然而,尽管它们的声明性方面很好,但是由于其约束网络的巨大规模,所有这些建模都存在缩放问题。为了解决这个问题,我们提出了“前缀-投影”全局约束,该约束封装了子序列关系以及频率约束。它的过滤算法依赖于计划数据库的原理,该原理允许保留变量域中的值,仅导致数据库中频繁出现模式的值。前缀投影过滤算法在多项式时间内在当前频繁前缀之后的变量上强制执行域一致性。此全局约束还允许直接实施其他约束,例如大小,项目成员资格,正则表达式以及它们的任意组合。实验结果表明,我们的方法明显优于现有的CP方法,并且可以与大型数据集上的最新方法相媲美,以挖掘频繁的顺序模式,各种约束下的顺序模式以及 前缀投影过滤算法在多项式时间内在当前频繁前缀之后的变量上强制执行域一致性。此全局约束还允许直接实施其他约束,例如大小,项目成员资格,正则表达式以及它们的任意组合。实验结果表明,我们的方法明显优于现有的CP方法,并且可以与大型数据集上的最新方法相媲美,以挖掘频繁的顺序模式,各种约束下的顺序模式以及 前缀投影过滤算法对多项式时间内当前频繁前缀之后的变量强制执行域一致性。此全局约束还允许直接实施其他约束,例如大小,项目成员资格,正则表达式以及它们的任意组合。实验结果表明,我们的方法明显优于现有的CP方法,并且可以与大型数据集上的最新方法相媲美,以挖掘频繁的顺序模式,各种约束下的顺序模式以及k个连续模式。与现有的CP方法不同,我们的方法可实现更好的可伸缩性。
更新日期:2016-08-26
down
wechat
bug