当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
NegPSpan: efficient extraction of negative sequential patterns with embedding constraints
Data Mining and Knowledge Discovery ( IF 2.8 ) Pub Date : 2020-01-21 , DOI: 10.1007/s10618-019-00672-w
Thomas Guyet , René Quiniou

Sequential pattern mining is concerned with the extraction of frequent or recurrent behaviors, modeled as subsequences, from a sequence dataset. Such patterns inform about which events are frequently observed in sequences, i.e. events that really happen. Sometimes, knowing that some specific event does not happen is more informative than extracting observed events. Negative sequential patterns (NSPs) capture recurrent behaviors by patterns having the form of sequences mentioning both observed events and absence of events. Few approaches have been proposed to mine such NSPs. In addition, the syntax and semantics of NSPs differ in the different methods which makes it difficult to compare them. This article provides a unified framework for the formulation of the syntax and the semantics of NSPs. Then, we introduce a new algorithm, NegPSpan, that extracts NSPs using a prefix-based depth-first scheme, enabling maxgap constraints that other approaches do not take into account. The formal framework highlights the differences between the proposed approach and methods from the literature, especially against the state of the art approach eNSP. Intensive experiments on synthetic and real datasets show that NegPSpan can extract meaningful NSPs and that it can process bigger datasets than eNSP thanks to significantly lower memory requirements and better computation times.

中文翻译:

NegPSpan:有效嵌入嵌入约束的负序模式

顺序模式挖掘与从序列数据集中建模为子序列的频繁或重复行为的提取有关。这样的模式告知哪些事件经常在序列中观察到,即真正发生的事件。有时,知道某些特定事件没有发生比提取观察到的事件更具信息意义。负序模式(NSP)通过具有提及观察到的事件和不存在事件的序列形式的模式来捕获重复行为。很少有人提议开采此类NSP。另外,NSP的语法和语义在不同的方法上也不同,这使得它们很难进行比较。本文为NSP的语法和语义提供了一个统一的框架。然后,我们介绍一种新算法NegPSpan,它使用基于前缀的深度优先方案提取NSP,从而启用其他方法未考虑的最大间隙约束。正式框架突出了所提出的方法与文献中的方法之间的差异,特别是针对最新技术方法eNSP。对合成数据集和实际数据集进行的大量实验表明,NegPSpan可以提取有意义的NSP,并且由于显着降低了内存需求并缩短了计算时间,因此可以处理比eNSP更大的数据集。
更新日期:2020-01-21
down
wechat
bug