A High Throughput B+tree for SIMD architectures,IEEE Transactions on Parallel and Distributed Systems

当前位置： X-MOL 学术 › IEEE Trans. Parallel Distrib. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A High Throughput B+tree for SIMD architectures
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2020-03-01 , DOI: 10.1109/tpds.2019.2942918
Weihua Zhang , Zhaofeng Yan , Yuzhe Lin , Chuanlei Zhao , Lu Peng

B+tree is one of the most important data structures and has been widely used in different fields. With the increase of concurrent queries and data-scale in storage, designing an efficient B+tree structure has become critical. Due to abundant computation resources, SIMD architectures provide potential opportunities to achieve high query throughput for B+tree. However, prior methods cannot achieve satisfactory performance results due to low resource utilization and poor memory performance. In this paper, we first identify the gaps between B+tree and SIMD architectures. Concurrent B+tree queries involve many global memory accesses and different divergences, which mismatch with SIMD architecture features. Based on this observation, we propose Harmonia, a novel B+tree structure to bridge the gaps. In Harmonia, a B+tree structure is divided into a key region and a child region. The key region stores the nodes with its keys in a breadth-first order. The child region is organized as a prefix-sum array, which only stores each node's first child index in the key region. Since the prefix-sum child region is small and the children's index can be retrieved through index computations, most of it can be stored in on-chip caches, which can achieve good cache locality. To make it more efficient, Harmonia also includes two optimizations: partially-sorted aggregation and narrowed thread-group traversal, which can mitigate memory and execution divergence and improve resource utilization. Evaluations on a 28-core INTEL CPU show that Harmonia can achieve up to 207 million queries per second, which is about 1.7X faster than that of CPU-based HB+Tree [1] , a recent state-of-the-art solution. And on a Volta TITAN V GPU, it can achieve up to 3.6 billion queries per second, which is about 3.4X faster than that of GPU-based HB+Tree.

中文翻译：

SIMD 架构的高吞吐量 B+树

B+树是最重要的数据结构之一，已被广泛应用于不同领域。随着并发查询和存储数据规模的增加，设计高效的B+树结构变得至关重要。由于丰富的计算资源，SIMD 架构为实现 B+树的高查询吞吐量提供了潜在的机会。然而，由于资源利用率低和内存性能差，现有方法无法获得令人满意的性能结果。在本文中，我们首先确定 B+tree 和 SIMD 架构之间的差距。并发 B+tree 查询涉及许多全局内存访问和不同的分歧，这与 SIMD 架构特性不匹配。基于这一观察，我们提出了 Harmonia，一种新颖的 B+树结构来弥合差距。在哈摩尼亚，B+树结构分为关键区域和子区域。密钥区域以广度优先的顺序存储节点及其密钥。子区域被组织成一个prefix-sum数组，它只存储每个节点在关键区域中的第一个子索引。由于prefix-sum子区域较小，并且可以通过索引计算来检索子索引，因此大部分可以存储在片上缓存中，可以实现良好的缓存局部性。为了提高效率，Harmonia 还包括了两个优化：部分排序聚合和缩小线程组遍历，这可以减少内存和执行发散并提高资源利用率。在 28 核 INTEL CPU 上的评估表明，Harmonia 每秒可以实现高达 2.07 亿次查询，比基于 CPU 的 HB+Tree [1] 快约 1.7 倍，最近最先进的解决方案。并且在 Volta TITAN V GPU 上，它可以实现高达每秒 36 亿次查询，比基于 GPU 的 HB+Tree 快约 3.4 倍。

更新日期：2020-03-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11