当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hyper-structure mining of frequent patterns in uncertain data streams.
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2012-11-01 , DOI: 10.1007/s10115-012-0581-y
Chandima Hewanadungodage 1 , Yuni Xia 1 , Jaehwan John Lee 2 , Yi-Cheng Tu 3
Affiliation  

Data uncertainty is inherent in many real-world applications such as sensor monitoring systems, location-based services, and medical diagnostic systems. Moreover, many real-world applications are now capable of producing continuous, unbounded data streams. During the recent years, new methods have been developed to find frequent patterns in uncertain databases; nevertheless, very limited work has been done in discovering frequent patterns in uncertain data streams. The current solutions for frequent pattern mining in uncertain streams take a FP-tree-based approach; however, recent studies have shown that FP-tree-based algorithms do not perform well in the presence of data uncertainty. In this paper, we propose two hyper-structure-based false-positive-oriented algorithms to efficiently mine frequent itemsets from streams of uncertain data. The first algorithm, UHS-Stream, is designed to find all frequent itemsets up to the current moment. The second algorithm, TFUHS-Stream, is designed to find frequent itemsets in an uncertain data stream in a time-fading manner. Experimental results show that the proposed hyper-structure-based algorithms outperform the existing tree-based algorithms in terms of accuracy, runtime, and memory usage.

中文翻译:

不确定数据流中频繁模式的超结构挖掘。

数据不确定性是许多实际应用中固有的,例如传感器监控系统、基于位置的服务和医疗诊断系统。此外,许多现实世界的应用程序现在能够产生连续的、无限制的数据流。近年来,已经开发了新的方法来在不确定的数据库中找到频繁的模式;然而,在发现不确定数据流中的频繁模式方面所做的工作非常有限。当前不确定流中频繁模式挖掘的解决方案采用基于 FP-tree 的方法;然而,最近的研究表明,基于 FP-tree 的算法在存在数据不确定性的情况下表现不佳。在本文中,我们提出了两种基于超结构的面向误报的算法,以有效地从不确定数据流中挖掘频繁项集。第一个算法 UHS-Stream 旨在查找当前时刻的所有频繁项集。第二种算法,TFUHS-Stream,旨在以时间衰落的方式在不确定的数据流中找到频繁项集。实验结果表明,所提出的基于超结构的算法在准确性、运行时间和内存使用方面优于现有的基于树的算法。
更新日期:2012-11-01
down
wechat
bug