当前位置: X-MOL 学术arXiv.cs.DS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fast Succinct Retrieval and Approximate Membership using Ribbon
arXiv - CS - Data Structures and Algorithms Pub Date : 2021-09-04 , DOI: arxiv-2109.01892
Peter C. Dillinger, Lorenz Hübschle-Schneider, Peter Sanders, Stefan Walzer

A retrieval data structure for a static function $f:S\rightarrow \{0,1\}^r$ supports queries that return $f(x)$ for any $x \in S$. Retrieval data structures can be used to implement a static approximate membership query data structure (AMQ) (i.e., a Bloom filter alternative) with false positive rate $2^{-r}$. The information-theoretic lower bound for both tasks is $r|S|$ bits. While succinct theoretical constructions using $(1+o(1))r|S|$ bits were known, these could not achieve very small overheads in practice because they have an unfavorable space-time tradeoff hidden in the asymptotic costs or because small overheads would only be reached for physically impossible input sizes. With bumped ribbon retrieval (BuRR), we present the first practical succinct retrieval data structure. In an extensive experimental evaluation BuRR achieves space overheads well below $1\,\%$ while being faster than most previously used retrieval data structures (typically with space overheads at least an order of magnitude larger) and faster than classical Bloom filters (with space overhead $\geq 44\,\%$). This efficiency, including favorable constants, stems from a combination of simplicity, word parallelism, and high locality. We additionally describe homogeneous ribbon filter AMQs, which are even simpler and faster at the price of slightly larger space overhead.

中文翻译:

使用功能区快速简洁检索和近似成员资格

静态函数 $f:S\rightarrow \{0,1\}^r$ 的检索数据结构支持对任何 $x \in S$ 返回 $f(x)$ 的查询。检索数据结构可用于实现误报率$2^{-r}$ 的静态近似成员查询数据结构(AMQ)(即布隆过滤器替代方案)。这两个任务的信息论下限是 $r|S|$ 位。虽然使用 $(1+o(1))r|S|$ 位的简洁理论结构是已知的,但这些在实践中无法实现非常小的开销,因为它们在渐近成本中隐藏着不利的时空权衡,或者因为开销很小只会达到物理上不可能的输入大小。通过碰撞色带检索(BuRR),我们提出了第一个实用的简洁检索数据结构。在广泛的实验评估中,BuRR 实现了远低于 $1\,\%$ 的空间开销,同时比大多数以前使用的检索数据结构(通常空间开销至少大一个数量级)更快,并且比经典布隆过滤器(具有空间开销)更快$\geq 44\,\%$)。这种效率(包括有利的常数)源于简单性、单词并行性和高局部性的组合。我们还描述了同质带式过滤器 AMQ,它们更简单、更快,但空间开销稍大。源于简单性、单词并行性和高度局部性的组合。我们还描述了同质带式过滤器 AMQ,它们更简单、更快,但空间开销稍大。源于简单性、单词并行性和高度局部性的组合。我们还描述了同质带式过滤器 AMQ,它们更简单、更快,但空间开销稍大。
更新日期:2021-09-07
down
wechat
bug