Fast scalable construction of ([compressed] static | minimal perfect hash) functions,Information and Computation

当前位置： X-MOL 学术 › Inf. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Fast scalable construction of ([compressed] static | minimal perfect hash) functions
Information and Computation ( IF 1 ) Pub Date : 2020-01-15 , DOI: 10.1016/j.ic.2020.104517
Marco Genuzio , Giuseppe Ottaviano , Sebastiano Vigna

Recent advances in the analysis of random linear systems on finite fields have paved the way for the construction of constant-time data structures representing static functions and minimal perfect hash functions using less space with respect to existing techniques. The main obstacle for any practical application of these results is the time required to solve such linear systems: despite they can be made very small, the computation is still too slow to be feasible.

In this paper, we describe in detail a number of heuristics and programming techniques to speed up the solution of these systems by orders of magnitude, making the overall construction competitive with the standard and widely used MWHC technique, which is based on hypergraph peeling. In particular, we introduce broadword programming techniques for fast equation manipulation and a lazy Gaussian elimination algorithm. We also describe a number of technical improvements to the data structure which further reduce space usage and improve lookup speed.

Our implementation of these techniques yields a minimal perfect hash function data structure occupying 2.24 bits per element, compared to 2.68 for MWHC-based ones, and a static function data structure which reduces the multiplicative overhead from 1.23 to 1.024. For functions whose output has low entropy, we are able to implement feasibly for the first time the Hreinsson–Krøyer–Pagh approach, which makes it possible, for example, to store a function with an output of 10⁶ values distributed following a power law of exponent 2 in just 2.76 bits per key instead of 20.

中文翻译：

（[压缩的]静态|最小完美哈希）函数的快速可伸缩构造

有限域上随机线性系统分析的最新进展为构建代表静态函数和最小完美散列函数的恒定时间数据结构铺平了道路，相对于现有技术，该结构使用更少的空间。这些结果在任何实际应用中的主要障碍是解决此类线性系统所需的时间：尽管可以将它们制得很小，但计算速度仍然太慢而无法实现。

在本文中，我们详细描述了多种启发式和编程技术，可将这些系统的解决方案数量级地加快，从而使整体构造与基于超图剥皮的标准且广泛使用的MWHC技术具有竞争力。特别是，我们介绍了用于快速方程运算的宽泛字编程技术和一种惰性高斯消除算法。我们还描述了对数据结构的许多技术改进，这些改进进一步减少了空间使用并提高了查找速度。

我们对这些技术的实现产生了一个最小的完美散列函数数据结构，每个元素占用2.24位，而基于MWHC的散列函数数据结构为2.68位，而静态函数数据结构将乘法开销从1.23减少到1.024。对于输出熵较低的函数，我们能够首次实现Hreinsson–Krøyer–Pagh方法，例如，可以存储一个函数，该函数的输出为10 ⁶遵循幂定律分布的值每个键只需2.76位，而不是20位的指数2。

更新日期：2020-01-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>