A Novel Pruning Strategy for Mining Discriminative Patterns,Iranian Journal of Science and Technology, Transactions of Electrical Engineering

当前位置： X-MOL 学术 › Iran. J. Sci. Technol. Trans. Electr. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Novel Pruning Strategy for Mining Discriminative Patterns
Iranian Journal of Science and Technology, Transactions of Electrical Engineering ( IF 2.4 ) Pub Date : 2021-01-05 , DOI: 10.1007/s40998-020-00397-3
Nader Aryabarzan , Behrouz Minaei-Bidgoli

Discriminative patterns are sets of characteristics that differentiate multiple groups from each other, for example, successful and unsuccessful medical treatments. The objective of the discriminative pattern mining task is to discover a set of significant patterns that occur with disproportionate frequencies in different class-labeled datasets, generally dataset $$ D^{ + } $$ D + against dataset $$ D^{ - } $$ D - . The discriminative pattern mining task faces two important problems: (1) the large search space problem where the search space exponentially increases with the number of items, and (2) the redundancy problem where the discriminative power of many patterns mainly derives from their sub-patterns. The common method to overcome the large search space problem is to discover frequent patterns in $$ D^{ + } $$ D + and to use them as candidate discriminative patterns. In this paper, (1) we introduce a novel pruning strategy to reduce the search space. This strategy generates a new dataset $$ D^{new} $$ D new = $$ D^{ + } $$ D + − $$ D^{ - } $$ D - and employs frequent patterns in it as candidate discriminative patterns. Following this idea, another problem appears: how to implement this idea efficiently? (2) Note that we do not explicitly calculate $$ D^{ + } $$ D + − $$ D^{ - } $$ D - . To directly mine the frequent patterns in $$ D^{ + } $$ D + − $$ D^{ - } $$ D - , we propose a prefix-tree, dubbed DDP-tree. This tree is directly built from $$ D^{ + } $$ D + and $$ D^{ - } $$ D - , and contains the essential information about frequent patterns in $$ D^{ + } $$ D + − $$ D^{ - } $$ D - . (3) To show the effectiveness of this strategy, we propose an algorithm, dubbed DiffNRDP-Miner (DiffNRDP-Miner: Diff erence based n on- r edundant d iscriminative p attern miner .), based on it. The advantages of DiffNRDP-Miner are that it removes redundant patterns and only needs to set one parameter, unlike other algorithms where several parameters must be set. Experimental results on benchmark datasets demonstrate that: this strategy (1) generates good patterns, where most of them are discriminative, (2) significantly reduces the search space, and (3) does not decrease the discriminative information of patterns.

中文翻译：

一种用于挖掘判别模式的新剪枝策略

判别模式是将多个群体彼此区分开来的一组特征，例如成功和不成功的医疗。判别模式挖掘任务的目标是发现一组在不同类别标记的数据集中以不成比例的频率出现的重要模式，通常数据集 $$ D^{ + } $$ D + 对数据集 $$ D^{ - } $$ D - 。判别模式挖掘任务面临两个重要问题：（1）大搜索空间问题，其中搜索空间随着项目的数量呈指数增长；（2）冗余问题，其中许多模式的判别力主要来自它们的子模式。克服大搜索空间问题的常用方法是发现 $$ D^{ + } $$ D + 中的频繁模式并将它们用作候选判别模式。在本文中，（1）我们引入了一种新颖的修剪策略来减少搜索空间。该策略生成一个新数据集 $$ D^{new} $$ D new = $$ D^{ + } $$ D + − $$ D^{ - } $$ D - 并在其中使用频繁模式作为候选判别模式。顺着这个思路，又出现了一个问题：如何高效的实现这个思路？(2) 请注意，我们没有明确计算 $$ D^{ + } $$ D + − $$ D^{ - } $$ D - 。为了直接挖掘 $$ D^{ + } $$ D + − $$ D^{ - } $$ D - 中的频繁模式，我们提出了一个前缀树，称为 DDP 树。这棵树直接由 $$ D^{ + } $$ D + 和 $$ D^{ - } $$ D - 构建，并包含有关 $$ D^{ + } $$ D + − $$ D^{ - } $$ D - 中频繁模式的基本信息。(3) 为了展示该策略的有效性，我们在此基础上提出了一种算法，称为 DiffNRDP-Miner（DiffNRDP-Miner：Difference based n on-redundant discriminative pattern miner .）。DiffNRDP-Miner 的优点是去除了冗余模式，只需要设置一个参数，不像其他算法必须设置多个参数。在基准数据集上的实验结果表明：该策略 (1) 生成了良好的模式，其中大多数具有判别性，(2) 显着减少了搜索空间，以及 (3) 不会减少模式的判别信息。被称为 DiffNRDP-Miner（DiffNRDP-Miner：Difference based n on- redundant discriminative pattern miner .），基于它。DiffNRDP-Miner 的优点是去除了冗余模式，只需要设置一个参数，不像其他算法必须设置多个参数。在基准数据集上的实验结果表明：该策略 (1) 生成了良好的模式，其中大多数具有判别性，(2) 显着减少了搜索空间，以及 (3) 不会减少模式的判别信息。被称为 DiffNRDP-Miner（DiffNRDP-Miner：Difference based n on- redundant discriminative pattern miner .），基于它。DiffNRDP-Miner 的优点是去除了冗余模式，只需要设置一个参数，不像其他算法必须设置多个参数。在基准数据集上的实验结果表明：该策略 (1) 生成了良好的模式，其中大多数具有判别性，(2) 显着减少了搜索空间，以及 (3) 不会减少模式的判别信息。

更新日期：2021-01-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>