当前位置:
X-MOL 学术
›
ACM SIGMOD Rec.
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimistically Compressed Hash Tables & Strings in theUSSR
ACM SIGMOD Record ( IF 1.1 ) Pub Date : 2021-06-18 , DOI: 10.1145/3471485.3471500 Tim Gubner 1 , Viktor Leis 2 , Peter Boncz 1
ACM SIGMOD Record ( IF 1.1 ) Pub Date : 2021-06-18 , DOI: 10.1145/3471485.3471500 Tim Gubner 1 , Viktor Leis 2 , Peter Boncz 1
Affiliation
Modern query engines rely heavily on hash tables for query processing. Overall query performance and memory footprint is often determined by how hash tables and the tuples within them are represented. In this work, we propose three complementary techniques to improve this representation: Domain-Guided Prefix Suppression bit-packs keys and values tightly to reduce hash table record width. Optimistic Splitting decomposes values (and operations on them) into (operations on) frequently- and infrequently-accessed value slices. By removing the infrequently-accessed value slices from the hash table record, it improves cache locality. The Unique Strings Self-aligned Region (USSR) accelerates handling frequently occurring strings, which are widespread in real-world data sets, by creating an on-the-fly dictionary of the most frequent strings. This allows executing many string operations with integer logic and reduces memory pressure. We integrated these techniques into Vectorwise. On the TPC-H benchmark, our approach reduces peak memory consumption by 2-4× and improves performance by up to 1.5×. On a real-world BI workload, we measured a 2× improvement in performance and in micro-benchmarks we observed speedups of up to 25×.
中文翻译:
苏联的乐观压缩哈希表和字符串
现代查询引擎严重依赖哈希表进行查询处理。整体查询性能和内存占用通常取决于哈希表和其中的元组的表示方式。在这项工作中,我们提出了三种互补的技术来改进这种表示:域引导前缀抑制位压缩键和值,以减少哈希表记录宽度。乐观拆分将值(和对它们的操作)分解为(操作)经常访问和不经常访问的值切片。通过从哈希表记录中删除不经常访问的值切片,它提高了缓存局部性。唯一字符串自对齐区域 (USSR) 通过创建最频繁字符串的动态字典来加速处理频繁出现的字符串,这些字符串在现实世界的数据集中很普遍。这允许使用整数逻辑执行许多字符串操作并减少内存压力。我们将这些技术集成到 Vectorwise 中。在 TPC-H 基准测试中,我们的方法将峰值内存消耗降低了 2-4 倍,并将性能提高了 1.5 倍。在真实世界的 BI 工作负载上,我们测量了 2 倍的性能提升,在微基准测试中我们观察到了高达 25 倍的加速。
更新日期:2021-06-18
中文翻译:
苏联的乐观压缩哈希表和字符串
现代查询引擎严重依赖哈希表进行查询处理。整体查询性能和内存占用通常取决于哈希表和其中的元组的表示方式。在这项工作中,我们提出了三种互补的技术来改进这种表示:域引导前缀抑制位压缩键和值,以减少哈希表记录宽度。乐观拆分将值(和对它们的操作)分解为(操作)经常访问和不经常访问的值切片。通过从哈希表记录中删除不经常访问的值切片,它提高了缓存局部性。唯一字符串自对齐区域 (USSR) 通过创建最频繁字符串的动态字典来加速处理频繁出现的字符串,这些字符串在现实世界的数据集中很普遍。这允许使用整数逻辑执行许多字符串操作并减少内存压力。我们将这些技术集成到 Vectorwise 中。在 TPC-H 基准测试中,我们的方法将峰值内存消耗降低了 2-4 倍,并将性能提高了 1.5 倍。在真实世界的 BI 工作负载上,我们测量了 2 倍的性能提升,在微基准测试中我们观察到了高达 25 倍的加速。