SAL-hashing: A Self-Adaptive Linear Hashing Index for SSDs,IEEE Transactions on Knowledge and Data Engineering

当前位置： X-MOL 学术 › IEEE Trans. Knowl. Data. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SAL-hashing: A Self-Adaptive Linear Hashing Index for SSDs
IEEE Transactions on Knowledge and Data Engineering ( IF 8.9 ) Pub Date : 2020-03-01 , DOI: 10.1109/tkde.2018.2884714
Peiquan Jin , Chengcheng Yang , Xiaoliang Wang , Lihua Yue , Dezhi Zhang

Flash memory based solid state drives (SSDs) have emerged as a new alternative to replace magnetic disks due to their high performance and low power consumption. However, random writes on SSDs are much slower than SSD reads. Therefore, traditional index structures, which are designed based on the symmetrical I/O property of magnetic disks, cannot completely exert the high performance of SSDs. In this paper, we propose an SSD-optimized linear hashing index called Self-Adaptive Linear Hashing (SAL-hashing) to reduce small random-writes to SSDs that are caused by index operations. The contributions of our work are manifold. First, we propose to organize the buckets of a linear hashing index into groups and sets to facilitate coarse-grained writes and adaptivity to access patterns. A group consisting of a fixed number of buckets is proposed to transform small random writes to buckets into coarse-grained writes and in turn improve write performance of the index. A set consists of a number of groups, and we propose to employ different split strategies for each set. With this mechanism, SAL-hashing is able to adapt to the changes of access patterns. Second, we attach a log region to each set, and amortize the cost of reads and writes by committing updates to the log region in batch. Third, in order to reduce search cost, each log region is equipped with Bloom filters to index update logs. We devise a cost-based online algorithm to adaptively merge the log region with the corresponding set when the set becomes search-intensive. Fourth, we propose a new technique called virtual split to optimize the search performance of SAL-hashing. Finally, we propose a new scheme for the management of the log buffer. We conduct extensive experiments on real SSDs. The results suggest that our proposal is self-adaptive according to the change of access patterns, and outperforms several competitors under various workloads.

中文翻译：

SAL 哈希：SSD 的自适应线性哈希索引

由于其高性能和低功耗，基于闪存的固态驱动器 (SSD) 已成为替代磁盘的新替代方案。但是，SSD 上的随机写入比 SSD 读取慢得多。因此，传统的基于磁盘对称I/O特性设计的索引结构无法完全发挥SSD的高性能。在本文中，我们提出了一种 SSD 优化的线性散列索引，称为自自适应线性散列 (SAL-hashing)，以减少由索引操作引起的对 SSD 的小随机写入。我们工作的贡献是多方面的。首先，我们建议将线性哈希索引的桶组织成组和集，以促进粗粒度写入和对访问模式的适应性。提出了一个由固定数量的bucket组成的group，把对bucket的小随机写入转化为粗粒度的写入，进而提高索引的写入性能。一个集合由多个组组成，我们建议对每个集合采用不同的拆分策略。通过这种机制，SAL-hashing 能够适应访问模式的变化。其次，我们为每个集合附加一个日志区域，并通过批量提交对日志区域的更新来分摊读取和写入的成本。第三，为了降低搜索成本，每个日志区域都配备了布隆过滤器来索引更新日志。我们设计了一种基于成本的在线算法，当集合变得搜索密集时，自适应地将日志区域与相应的集合合并。第四，我们提出了一种称为虚拟拆分的新技术来优化 SAL 哈希的搜索性能。最后，我们提出了一种新的日志缓冲区管理方案。我们对真正的 SSD 进行了大量实验。结果表明，我们的提议根据访问模式的变化具有自适应性，并且在各种工作负载下的表现优于几个竞争对手。

更新日期：2020-03-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>