当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ProSecCo: progressive sequence mining with convergence guarantees
Knowledge and Information Systems ( IF 2.5 ) Pub Date : 2019-08-20 , DOI: 10.1007/s10115-019-01393-8
Sacha Servan-Schreiber , Matteo Riondato , Emanuel Zgraggen

We present ProSecCo, an algorithm for the progressive mining of frequent sequences from large transactional datasets: It processes the dataset in blocks and it outputs, after having analyzed each block, a high-quality approximation of the collection of frequent sequences. ProSecCo can be used for interactive data exploration, as the intermediate results enable the user to make informed decisions as the computation proceeds. These intermediate results have strong probabilistic approximation guarantees and the final output is the exact collection of frequent sequences. Our correctness analysis uses the Vapnik–Chervonenkis (VC) dimension, a key concept from statistical learning theory. The results of our experimental evaluation of ProSecCo on real and artificial datasets show that it produces fast-converging high-quality results almost immediately. Its practical performance is even better than what is guaranteed by the theoretical analysis, and ProSecCo can even be faster than existing state-of-the-art non-progressive algorithms. Additionally, our experimental results show that ProSecCo uses a constant amount of memory, and orders of magnitude less than other standard, non-progressive, sequential pattern mining algorithms.

中文翻译:

ProSecCo:具有收敛性保证的渐进式序列挖掘

我们提出ProSecCo,这是一种从大型事务数据集中逐步挖掘频繁序列的算法:它以块为单位处理数据集,并在分析完每个块之后输出频繁序列集合的高质量近似值。ProSecCo可用于交互式数据浏览,因为中间结果使用户能够在计算进行时做出明智的决策。这些中间结果具有很强的概率近似保证,并且最终输出是频繁序列的精确集合。我们的正确性分析使用Vapnik–Chervonenkis(VC)维度,这是统计学习理论中的一个关键概念。我们的实验评估结果ProSecCo在真实数据集和人工数据集上的显示,几乎可以立即产生快速收敛的高质量结果。它的实际性能甚至比理论分析所保证的要好,并且ProSecCo甚至可以比现有的最新非渐进算法更快。此外,我们的实验结果表明,ProSecCo使用恒定数量的内存,并且比其他标准的非渐进式顺序模式挖掘算法少几个数量级。
更新日期:2019-08-20
down
wechat
bug