当前位置: X-MOL 学术Eng. Appl. Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
High average-utility itemset mining with multiple minimum utility threshold: A generalized approach
Engineering Applications of Artificial Intelligence ( IF 8 ) Pub Date : 2020-09-10 , DOI: 10.1016/j.engappai.2020.103933
Krishan Kumar Sethi , Dharavath Ramesh

High Average-Utility Itemset (HAUI) mining is an emerging pattern mining technique to extract meaningful patterns from a transaction dataset. In the past, several HAUI mining algorithms have been developed with efficient upper-bounds and pruning strategies. However, all these algorithms use a single value of the minimum average-utility threshold for all the itemsets, which limits their applicability to real-life datasets. In order to address this issue, several HAUI mining algorithms with multiple average-utility thresholds have been developed that process the items in ascending order of their minimum average-utility threshold. However, it makes them inapplicable on traditional HAUI mining algorithms. Moreover, the perturbation in preference of items may reduce the performance of the algorithms. This paper presents an HAUI mining algorithm named Generalized High Average-utility Itemset Miner (GHAIM) that processes the items in ascending order of their Average Utility Upper-Bound (AUUB) like the traditional HAUI mining algorithms. A new approach named suffix minimum average-utility is proposed to retain the downward closure property of AUUB and several pruning methods. Besides, a compact list structure is also proposed to mine the HAUIs in one phase. Several pruning methods have been introduced for reducing search space and improving efficiency. Extensive experiments were performed with different sparse and dense types of datasets to determine GHAIM efficiency compared to two existing algorithms. It was observed from the results that GHAIM outperforms both the current algorithms in run time, memory consumption, number of candidate itemsets, and scalability.



中文翻译:

具有多个最小效用阈值的高平均效用项集挖掘:一种通用方法

高平均效用项集(HAUI)挖掘是一种新兴的模式挖掘技术,用于从事务数据集中提取有意义的模式。过去,已经开发了几种具有有效上限和修剪策略的HAUI挖掘算法。但是,所有这些算法都为所有项目集使用最小平均效用阈值的单个值,这限制了它们对现实数据集的适用性。为了解决此问题,已经开发了几种具有多个平均效用阈值的HAUI挖掘算法,这些算法以最小平均效用阈值的升序处理这些项目。但是,这使它们不适用于传统的HAUI挖掘算法。而且,偏好项的摄动可能会降低算法的性能。本文介绍了一种称为通用高平均效用项集挖掘器(GHAIM)的HAUI挖掘算法,该算法以类似于传统HAUI挖掘算法的平均效用上限(AUUB)升序处理项目。提出了一种名为后缀最小平均效用的新方法,以保留AUUB的向下关闭属性和几种修剪方法。此外,还提出了一种紧凑的列表结构来分阶段挖掘HAUI。已经引入了几种修剪方法以减少搜索空间并提高效率。与两个现有算法相比,使用不同稀疏和密集类型的数据集进行了广泛的实验,以确定GHAIM效率。从结果可以看出,GHAIM在运行时间,内存消耗,

更新日期:2020-09-10
down
wechat
bug