Efficient weighted probabilistic frequent itemset mining in uncertain databases,Expert Systems

当前位置： X-MOL 学术 › Expert Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Efficient weighted probabilistic frequent itemset mining in uncertain databases
Expert Systems ( IF 3.0 ) Pub Date : 2020-04-07 , DOI: 10.1111/exsy.12551
Zhiyang Li ₁ , Fengjuan Chen ₁ , Junfeng Wu ₂ , Zhaobin Liu ₁ , Weijiang Liu ₁

Affiliation

Uncertain data mining has attracted so much interest in many emerging applications over the past decade. An issue of particular interest is to discover the frequent itemsets in uncertain databases. As an item would not appear in a transaction of such database for certain, several probability models are presented to measure the frequency of an itemset, and the frequent itemset over probabilistic data generally has two different definitions: the expected support-based frequent itemset and probabilistic frequent itemset. Meanwhile, it is noted that the frequency itself cannot identify useful or meaningful patterns in some scenarios. Other measures such as the importance of items should be also taken into account. To this end, some studies recently have been done on weighted (importance) frequent itemset mining in uncertain databases. However, they are only designed for the expected support-based frequent itemset, and suffer from low efficiency due to generating too many frequent itemset candidates. To address this issue, we propose a novel weighted probabilistic frequent itemsets (w-PFIs) algorithm. Moreover, we derive a probability model for the support of a w-PFI candidate in our method and present three pruning techniques to narrow the search space and remove the unpromising candidates immediately. Extensive experiments have been conducted on both real and synthetic datasets, to evaluate the performance of our w-PFI algorithm in terms of runtime, accuracy and scalability. Results show that our algorithm yields the best performance among the existing algorithms.

中文翻译：

不确定数据库中有效的加权概率频繁项集挖掘

在过去的十年中，不确定数据挖掘引起了许多新兴应用的极大兴趣。一个特别感兴趣的问题是发现不确定数据库中的频繁项集。由于项目肯定不会出现在此类数据库的事务中，因此提出了几种概率模型来衡量项目集的频率，而概率数据上的频繁项目集通常有两种不同的定义：基于期望支持的频繁项目集和概率频繁项集。同时，需要注意的是，在某些情况下，频率本身无法识别有用或有意义的模式。还应考虑其他措施，例如物品的重要性。为此，最近对不确定数据库中的加权（重要性）频繁项集挖掘进行了一些研究。然而，它们仅针对基于预期支持的频繁项集而设计，并且由于生成过多的频繁项集候选而导致效率低下。为了解决这个问题，我们提出了一种新颖的加权概率频繁项集（w-PFI）算法。此外，我们在我们的方法中推导出了支持 w-PFI 候选者的概率模型，并提出了三种修剪技术来缩小搜索空间并立即删除没有希望的候选者。已经在真实数据集和合成数据集上进行了大量实验，以评估我们的 w-PFI 算法在运行时间、准确性和可扩展性方面的性能。结果表明，我们的算法在现有算法中产生了最好的性能。并且由于生成过多的频繁项集候选而导致效率低下。为了解决这个问题，我们提出了一种新颖的加权概率频繁项集（w-PFI）算法。此外，我们在我们的方法中推导出了支持 w-PFI 候选者的概率模型，并提出了三种修剪技术来缩小搜索空间并立即删除没有希望的候选者。已经在真实数据集和合成数据集上进行了大量实验，以评估我们的 w-PFI 算法在运行时间、准确性和可扩展性方面的性能。结果表明，我们的算法在现有算法中产生了最好的性能。并且由于生成过多的频繁项集候选而导致效率低下。为了解决这个问题，我们提出了一种新颖的加权概率频繁项集（w-PFI）算法。此外，我们在我们的方法中推导出了支持 w-PFI 候选者的概率模型，并提出了三种修剪技术来缩小搜索空间并立即删除没有希望的候选者。已经在真实数据集和合成数据集上进行了大量实验，以评估我们的 w-PFI 算法在运行时间、准确性和可扩展性方面的性能。结果表明，我们的算法在现有算法中产生了最好的性能。我们在我们的方法中推导出了支持 w-PFI 候选者的概率模型，并提出了三种修剪技术来缩小搜索空间并立即删除没有希望的候选者。已经在真实数据集和合成数据集上进行了大量实验，以评估我们的 w-PFI 算法在运行时间、准确性和可扩展性方面的性能。结果表明，我们的算法在现有算法中产生了最好的性能。我们在我们的方法中推导出了支持 w-PFI 候选者的概率模型，并提出了三种修剪技术来缩小搜索空间并立即删除没有希望的候选者。已经在真实数据集和合成数据集上进行了大量实验，以评估我们的 w-PFI 算法在运行时间、准确性和可扩展性方面的性能。结果表明，我们的算法在现有算法中产生了最好的性能。

更新日期：2020-04-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11