当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reservoir-based sampling over large graph streams to estimate triangle counts and node degrees
Future Generation Computer Systems ( IF 6.2 ) Pub Date : 2020-03-02 , DOI: 10.1016/j.future.2020.02.077
Lingling Zhang , Hong Jiang , Fang Wang , Dan Feng , Yanwen Xie

Reservoir sampling is widely employed to characterize large graph streams by producing edge samples. However, existing reservoir-based sampling methods mainly focus on counting triangles but perform poorly in analyzing topological characteristics reflected by node degrees. This paper proposes a new method, called triangle-induced reservoir sampling, or T-Sample, to count triangles and estimate node degrees simultaneously and efficiently. While every edge in a graph stream is processed only once by T-Sample, a dual sampling mechanism performing both uniform sampling and non-uniform sampling is carefully designed. Specifically, T-Sample’s uniform sampling is used to count triangles by a newly proposed method with smaller estimation variances than existing reservoir-based sampling methods; whereas, its non-uniform sampling ensures that edge samples are connected. Experimental results driven by real datasets show that T-Sample can count triangles with smaller estimation errors and variances than the state-of-the-art reservoir-based sampling methods while obtaining much more accurate information about node degrees at smaller time and memory costs.



中文翻译:

在大型图形流上进行基于水库的采样,以估计三角形数和节点度

油藏采样被广泛用于通过生成边缘样本来表征大型图流。然而,现有的基于水库的采样方法主要集中于三角形的计数,但是在分析节点度所反映的拓扑特征方面表现不佳。本文提出了一种新的方法,称为三角感应油藏采样,即T采样,可以同时并有效地对三角形进行计数并估计节点度。虽然图流中的每个边仅由T采样处理一次,但精心设计了执行均匀采样和非均匀采样的双重采样机制。具体来说,T-Sample的均匀采样用于通过新提议的方法对三角形进行计数,该方法的估计方差比现有的基于水库的采样方法小;而,其非均匀采样确保了边缘采样的连接。由实际数据集驱动的实验结果表明,与基于最新的基于水库的采样方法相比,T-Sample可以计算具有较小的估计误差和方差的三角形,同时以更短的时间和内存成本获得有关节点度的更准确的信息。

更新日期:2020-03-02
down
wechat
bug