当前位置: X-MOL 学术Arab. J. Sci. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
High Occupancy Itemset Mining with Consideration of Transaction Occupancy
Arabian Journal for Science and Engineering ( IF 2.6 ) Pub Date : 2021-09-09 , DOI: 10.1007/s13369-021-06075-8
Subrata Datta 1 , Kalyani Mali 1 , Udit Ghosh 2
Affiliation  

Discovering high occupancy itemsets is an interesting area of research in data mining. Occupancy computation in traditional approaches is restricted to the occupied portions of the itemsets in the supporting transactions only. It can’t distinguish between the occupancies of the same itemset in different supporting transactions of equal lengths. If itemset size is equal to the transaction length, occupancy becomes highest. The fact promotes the generation of undesirable itemsets especially the isolated ones. Furthermore, average occupancies of the itemsets having equal size become equal though they appear in different transactions of equal lengths. To address the above issues, this paper introduces the concept of transaction occupancy (TO) and thereafter presents a new computational model of itemset occupancy (IO) in account of transaction occupancy. Transaction occupancy refers to the occupied portion in the database by the transactions. This paper proposes an efficient list-structure-based algorithm called HOIMTO (high occupancy itemset mining with transaction occupancy) to discover the high occupancy itemsets (HOIs) from the transactional databases. A new itemset occupancy upper bound (IOUB) is also introduced in this paper to reduce the candidate search space. Experimental studies show the effectiveness of the proposed approach in terms of itemset generation, runtime, memory usages and scalability.



中文翻译:

考虑交易占用的高占用项目集挖掘

发现高占有率项集是数据挖掘中一个有趣的研究领域。传统方法中的占用计算仅限于支持事务中的项集的占用部分。它无法区分相同长度的不同支持事务中相同项集的占用情况。如果项集大小等于事务长度,则占用率最高。这一事实促进了不良项集的产生,尤其是孤立项集。此外,即使出现在相同长度的不同事务中,具有相同大小的项集的平均占用率也变得相等。针对上述问题,本文引入了交易占用的概念(TO),然后提出了一个新的项集占用( IO )计算模型,以考虑交易占用。事务占用是指事务在数据库中占用的部分。本文提出了一种高效的基于列表结构的算法,称为 HOIMTO(交易占用的高占用项集挖掘),以从事务数据库中发现高占用项集(HOI)。本文还引入了一个新的项集占用上限(IO UB)来减少候选搜索空间。实验研究表明所提出的方法在项目集生成、运行时间、内存使用和可扩展性方面的有效性。

更新日期:2021-09-09
down
wechat
bug