Anytime mining of sequential discriminative patterns in labeled sequences,Knowledge and Information Systems

当前位置： X-MOL 学术 › Knowl. Inf. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Anytime mining of sequential discriminative patterns in labeled sequences
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2020-11-10 , DOI: 10.1007/s10115-020-01523-7
Romain Mathonat , Diana Nurbakova , Jean-François Boulicaut , Mehdi Kaytoue

It is extremely useful to exploit labeled datasets not only to learn models and perform predictive analytics but also to improve our understanding of a domain and its available targeted classes. The subgroup discovery task has been considered for more than two decades. It concerns the discovery of patterns covering sets of objects having interesting properties, e.g., they characterize or discriminate a given target class. Though many subgroup discovery algorithms have been proposed for both transactional and numerical data, discovering subgroups within labeled sequential data has been much less studied. First, we propose an anytime algorithm SeqScout that discovers interesting subgroups w.r.t. a chosen quality measure. This is a sampling algorithm that mines discriminant sequential patterns using a multi-armed bandit model. For a given budget, it finds a collection of local optima in the search space of descriptions and thus, subgroups. It requires a light configuration and is independent from the quality measure used for pattern scoring. We also introduce a second anytime algorithm MCTSExtent that pushes further the idea of a better trade-off between exploration and exploitation of a sampling strategy over the search space. To the best of our knowledge, this is the first time that the Monte Carlo Tree Search framework is exploited in a sequential data mining setting. We have conducted a thorough and comprehensive evaluation of our algorithms on several datasets to illustrate their added value, and we discuss their qualitative and quantitative results.

中文翻译：

随时挖掘标记序列中的顺序判别模式

利用标记的数据集不仅非常有用，不仅可以学习模型并进行预测分析，还可以增进我们对域及其可用的目标类的理解。该群发现任务已被认为是超过二十年。它涉及覆盖具有有趣特性的对象集的模式的发现，例如，它们表征或区分给定的目标类。尽管已经为交易数据和数字数据提出了许多子组发现算法，但是在标记顺序数据中发现子组的研究却很少。首先，我们提出一个随时算法SeqScout通过选定的质量度量发现有趣的子组。这是一种采样算法，可使用多臂匪徒模型挖掘判别性连续模式。对于给定的预算，它会在描述的搜索空间中找到局部最优值的集合，从而在子组中找到它们。它需要轻巧的配置，并且与用于图案评分的质量度量标准无关。我们还介绍了第二种随时可用的算法MCTSExtent这进一步推动了在搜索空间上更好地在采样策略的探索与开发之间进行权衡的想法。据我们所知，这是第一次在顺序数据挖掘设置中利用蒙特卡洛树搜索框架。我们在几个数据集上对我们的算法进行了全面而全面的评估，以说明它们的附加值，并讨论了它们的定性和定量结果。

更新日期：2020-11-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>