当前位置: X-MOL 学术Comput. Geosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GeoDenStream: An improved DenStream clustering method for managing entity data within geographical data streams
Computers & Geosciences ( IF 4.4 ) Pub Date : 2020-11-01 , DOI: 10.1016/j.cageo.2020.104563
Manqi Li , Arie Croitoru , Songshan Yue

Abstract Data streams have become an integral part of the rapidly evolving modern information landscape in various application domains. Stream clustering, and in particular density-based clustering, has emerged as one of the most commonly used data stream analysis tasks. Several density-based stream clustering methods have been proposed; chief among them is DenStream. Existing DenStream clustering methods usually preserve only the key summary descriptors about each cluster such as the center and radius. Such approach is not suitable for streams that observe discrete entities, since the clustering process does not maintain the entity-level composition of each cluster over time. The primary challenge we explore in this paper is therefore how existing DenStream clustering methods can be enhanced to support entity-based stream mining in geographical space. In view of this consideration, this paper presents GeoDenStream, a spatiotemporal entity-based stream clustering method. Building on DenStream, GeoDenStream is particularly suitable for clustering discrete entities due to its ability to track the relationship between entities and clusters over time and its ability to recover data that has been incorrectly labeled as noise. Memory efficiency in GeoDenStream is achieved by using a combination of data pruning and indexing. The performance of GeoDenStream was evaluated with both synthetic and real-world stream data from a popular social media platform (Twitter). The results of these evaluations show that GeoDenStream is able to efficiently handle memory constraints, overlapping data points, and false noise.

中文翻译:

GeoDenStream:一种改进的 DenStream 聚类方法,用于管理地理数据流中的实体数据

摘要 数据流已成为各种应用领域中快速发展的现代信息格局的组成部分。流聚类,尤其是基于密度的聚类,已经成为最常用的数据流分析任务之一。已经提出了几种基于密度的流聚类方法;其中最主要的是DenStream。现有的 DenStream 聚类方法通常只保留关于每个聚类的关键摘要描述符,例如中心和半径。这种方法不适用于观察离散实体的流,因为聚类过程不会随着时间的推移保持每个集群的实体级组成。因此,我们在本文中探讨的主要挑战是如何增强现有的 DenStream 聚类方法以支持地理空间中基于实体的流挖掘。鉴于此,本文提出GeoDenStream,一种基于时空实体的流聚类方法。基于 DenStream,GeoDenStream 特别适用于聚类离散实体,因为它能够随时间跟踪实体和集群之间的关系,并且能够恢复被错误标记为噪声的数据。GeoDenStream 中的内存效率是通过结合使用数据修剪和索引来实现的。GeoDenStream 的性能通过来自流行社交媒体平台 (Twitter) 的合成和真实流数据进行评估。
更新日期:2020-11-01
down
wechat
bug