当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Age-Partitioned Bloom Filters
arXiv - CS - Databases Pub Date : 2020-01-09 , DOI: arxiv-2001.03147
Ariel Shtul and Carlos Baquero and Paulo S\'ergio Almeida

Bloom filters (BF) are widely used for approximate membership queries over a set of elements. BF variants allow removals, sets of unbounded size or querying a sliding window over an unbounded stream. However, for this last case the best current approaches are dictionary based (e.g., based on Cuckoo Filters or TinyTable), and it may seem that BF-based approaches will never be competitive to dictionary-based ones. In this paper we present Age-Partitioned Bloom Filters, a BF-based approach for duplicate detection in sliding windows that not only is competitive in time-complexity, but has better space usage than current dictionary-based approaches (e.g., SWAMP), at the cost of some moderate slack. APBFs retain the BF simplicity, unlike dictionary-based approaches, important for hardware-based implementations, and can integrate known improvements such as double hashing or blocking. We present an Age-Partitioned Blocked Bloom Filter variant which can operate with 2-3 cache-line accesses per insertion and around 2-4 per query, even for high accuracy filters.

中文翻译:

年龄分区布隆过滤器

布隆过滤器 (BF) 广泛用于对一组元素进行近似成员资格查询。BF 变体允许删除、设置无界大小或在无界流上查询滑动窗口。然而,对于最后一种情况,目前最好的方法是基于字典的(例如,基于 Cuckoo Filters 或 TinyTable),而且基于 BF 的方法似乎永远不会与基于字典的方法竞争。在本文中,我们介绍了年龄分区布隆过滤器,这是一种基于 BF 的滑动窗口重复检测方法,它不仅在时间复杂度上具有竞争力,而且比当前基于字典的方法(例如 SWAMP)具有更好的空间使用率,在一些适度松弛的代价。与基于字典的方法不同,APBF 保留了 BF 的简单性,这对于基于硬件的实现很重要,并且可以集成已知的改进,例如双重散列或阻塞。我们提出了一个 Age-Partitioned Blocked Bloom Filter 变体,即使对于高精度过滤器,它也可以在每次插入 2-3 次缓存行访问和每个查询约 2-4 次访问的情况下进行操作。
更新日期:2020-01-10
down
wechat
bug