当前位置: X-MOL 学术arXiv.cs.DS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SALSA: Self-Adjusting Lean Streaming Analytics
arXiv - CS - Data Structures and Algorithms Pub Date : 2021-02-24 , DOI: arxiv-2102.12531
Ran Ben Basat, Gil Einziger, Michael Mitzenmacher, Shay Vargaftik

Counters are the fundamental building block of many data sketching schemes, which hash items to a small number of counters and account for collisions to provide good approximations for frequencies and other measures. Most existing methods rely on fixed-size counters, which may be wasteful in terms of space, as counters must be large enough to eliminate any risk of overflow. Instead, some solutions use small, fixed-size counters that may overflow into secondary structures. This paper takes a different approach. We propose a simple and general method called SALSA for dynamic re-sizing of counters and show its effectiveness. SALSA starts with small counters, and overflowing counters simply merge with their neighbors. SALSA can thereby allow more counters for a given space, expanding them as necessary to represent large numbers. Our evaluation demonstrates that, at the cost of a small overhead for its merging logic, SALSA significantly improves the accuracy of popular schemes (such as Count-Min Sketch and Count Sketch) over a variety of tasks. Our code is released as open-source [1].

中文翻译:

SALSA:自我调整的精益流分析

计数器是许多数据草绘方案的基本构建块,这些方案将项目哈希到少量计数器并考虑冲突,从而为频率和其他度量提供良好的近似值。大多数现有方法依赖于固定大小的计数器,这在空间方面可能是浪费的,因为计数器必须足够大以消除任何溢出风险。取而代之的是,某些解决方案使用固定大小的小型计数器,这些计数器可能会溢出到二级结构中。本文采用了不同的方法。我们提出了一种称为SALSA的简单且通用的方法来动态调整计数器的大小,并显示其有效性。SALSA从小计数器开始,溢出的计数器仅与它们的邻居合并。因此,SALSA可以为给定的空间提供更多的计数器,并根据需要扩展它们以表示较大的数量。我们的评估表明,以其合并逻辑所需的一小笔开销为代价,SALSA可以显着提高流行方案(例如Count-Min Sketch和Count Sketch)在各种任务上的准确性。我们的代码以开源[1]的形式发布。
更新日期:2021-02-26
down
wechat
bug