A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure,The Journal of Supercomputing

当前位置： X-MOL 学术 › J. Supercomput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure
The Journal of Supercomputing ( IF 2.5 ) Pub Date : 2020-03-18 , DOI: 10.1007/s11227-020-03247-5
Krishan Kumar Sethi , Dharavath Ramesh

High-utility itemset mining is a prominent data-mining technique where the profit or weight of itemsets plays a crucial role in defining meaningful patterns. High average-utility itemset (HAUI) mining is an advancement over high-utility itemset mining, which introduces an unbiased measure called average utility to associate the utility of itemsets with their length. Several existing HAUI mining algorithms use various upper bounds such as average-utility upper bound, revised tighter upper bound, and looser upper bound to preserve pruning methods. However, these upper bounds overestimate the average-utility of itemsets and slow down the mining process. This paper presents a fast high average-utility itemset miner (FHAIM) algorithm, which uses two improved upper bounds and several efficient pruning strategies to avoid the processing of unpromising candidate itemsets. Moreover, a novel list structure named recommended average-utility list (RAUL) is presented to store the average-utility and the required information for pruning. The RAUL for an itemset can be constructed by joining the RAULs of its subsets to avoid excessive database scans. We have performed substantial experiments on various benchmark datasets to evaluate the performance of the FHAIM in comparison with two existing HAUI mining algorithms. Experimental results show that FHAIM outperforms the existing HAUI mining algorithms in terms of runtime, memory usage, join counts, and scalability.

中文翻译：

一种快速的高平均效用项集挖掘，具有有效的更紧上限和新颖的列表结构

高效用项集挖掘是一种突出的数据挖掘技术，其中项集的利润或权重在定义有意义的模式中起着至关重要的作用。高平均效用项集 (HAUI) 挖掘是对高效用项集挖掘的改进，它引入了一种称为平均效用的无偏度量，将项集的效用与其长度相关联。几种现有的 HAUI 挖掘算法使用各种上限，例如平均效用上限、修订后的更紧上限和更宽松的上限，以保留修剪方法。然而，这些上限高估了项集的平均效用并减慢了挖掘过程。本文提出了一种快速的高平均效用项集挖掘器（FHAIM）算法，它使用两个改进的上限和几个有效的修剪策略来避免处理无希望的候选项集。此外，提出了一种名为推荐平均效用列表（RAUL）的新型列表结构，用于存储平均效用和修剪所需的信息。项集的 RAUL 可以通过加入其子集的 RAUL 来构建，以避免过度的数据库扫描。我们已经在各种基准数据集上进行了大量实验，以评估 FHAIM 与两种现有 HAUI 挖掘算法相比的性能。实验结果表明，FHAIM 在运行时、内存使用、连接计数和可扩展性方面优于现有的 HAUI 挖掘算法。提出了一种名为推荐平均效用列表（RAUL）的新型列表结构，用于存储平均效用和修剪所需的信息。项集的 RAUL 可以通过加入其子集的 RAUL 来构建，以避免过度的数据库扫描。我们已经在各种基准数据集上进行了大量实验，以评估 FHAIM 与两种现有 HAUI 挖掘算法相比的性能。实验结果表明，FHAIM 在运行时、内存使用、连接计数和可扩展性方面优于现有的 HAUI 挖掘算法。提出了一种名为推荐平均效用列表（RAUL）的新型列表结构，用于存储平均效用和修剪所需的信息。项集的 RAUL 可以通过加入其子集的 RAUL 来构建，以避免过度的数据库扫描。我们已经在各种基准数据集上进行了大量实验，以评估 FHAIM 与两种现有 HAUI 挖掘算法相比的性能。实验结果表明，FHAIM 在运行时、内存使用、连接计数和可扩展性方面优于现有的 HAUI 挖掘算法。我们已经在各种基准数据集上进行了大量实验，以评估 FHAIM 与两种现有 HAUI 挖掘算法相比的性能。实验结果表明，FHAIM 在运行时、内存使用、连接计数和可扩展性方面优于现有的 HAUI 挖掘算法。我们已经在各种基准数据集上进行了大量实验，以评估 FHAIM 与两种现有 HAUI 挖掘算法相比的性能。实验结果表明，FHAIM 在运行时、内存使用、连接计数和可扩展性方面优于现有的 HAUI 挖掘算法。

更新日期：2020-03-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文