Toward a Better Understanding and Evaluation of Tree Structures on Flash SSDs,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Toward a Better Understanding and Evaluation of Tree Structures on Flash SSDs
arXiv - CS - Databases Pub Date : 2020-06-08 , DOI: arxiv-2006.04658
Diego Didona, Nikolas Ioannou, Radu Stoica, Kornilios Kourtis

Solid-state drives (SSDs) are extensively used to deploy persistent data stores, as they provide low latency random access, high write throughput, high data density, and low cost. Tree-based data structures are widely used to build persistent data stores, and indeed they lie at the backbone of many of the data management systems used in production and research today. In this paper, we show that benchmarking a persistent tree-based data structure on an SSD is a complex process, which may easily incur subtle pitfalls that can lead to an inaccurate performance assessment. At a high-level, these pitfalls stem from the interaction of complex software running on complex hardware. On one hand, tree structures implement internal operations that have nontrivial effects on performance. On the other hand, SSDs employ firmware logic to deal with the idiosyncrasies of the underlying flash memory, which are well known to lead to complex performance dynamics. We identify seven benchmarking pitfalls using RocksDB and WiredTiger, two widespread implementations of an LSM-Tree and a B+Tree, respectively. We show that such pitfalls can lead to incorrect measurements of key performance indicators, hinder the reproducibility and the representativeness of the results, and lead to suboptimal deployments in production environments. We also provide guidelines on how to avoid these pitfalls to obtain more reliable performance measurements, and to perform more thorough and fair comparison among different design points.

中文翻译：

更好地理解和评估闪存 SSD 上的树结构

固态驱动器 (SSD) 广泛用于部署持久性数据存储，因为它们提供低延迟随机访问、高写入吞吐量、高数据密度和低成本。基于树的数据结构被广泛用于构建持久性数据存储，事实上，它们是当今生产和研究中使用的许多数据管理系统的支柱。在本文中，我们表明在 SSD 上对基于树的持久数据结构进行基准测试是一个复杂的过程，很容易产生细微的缺陷，从而导致性能评估不准确。在高层次上，这些陷阱源于复杂硬件上运行的复杂软件的交互。一方面，树结构实现了对性能有重要影响的内部操作。另一方面，SSD 采用固件逻辑来处理底层闪存的特性，众所周知，这些特性会导致复杂的性能动态。我们使用 RocksDB 和 WiredTiger 确定了七个基准测试陷阱，这两个分别是 LSM-Tree 和 B+Tree 的广泛实现。我们表明，此类陷阱会导致关键性能指标的测量不正确，阻碍结果的可重复性和代表性，并导致生产环境中的部署欠佳。我们还提供了有关如何避免这些陷阱以获得更可靠的性能测量以及在不同设计点之间进行更彻底和更公平的比较的指南。我们使用 RocksDB 和 WiredTiger 确定了七个基准测试陷阱，这两个分别是 LSM-Tree 和 B+Tree 的广泛实现。我们表明，此类陷阱会导致关键性能指标的测量不正确，阻碍结果的可重复性和代表性，并导致生产环境中的部署欠佳。我们还提供了有关如何避免这些陷阱以获得更可靠的性能测量以及在不同设计点之间进行更彻底和更公平的比较的指南。我们使用 RocksDB 和 WiredTiger 确定了七个基准测试陷阱，这两个分别是 LSM-Tree 和 B+Tree 的广泛实现。我们表明，此类陷阱会导致关键性能指标的测量不正确，阻碍结果的可重复性和代表性，并导致生产环境中的部署欠佳。我们还提供了有关如何避免这些陷阱以获得更可靠的性能测量以及在不同设计点之间进行更彻底和更公平的比较的指南。并导致在生产环境中进行次优部署。我们还提供了有关如何避免这些陷阱以获得更可靠的性能测量以及在不同设计点之间进行更彻底和更公平的比较的指南。并导致在生产环境中进行次优部署。我们还提供了有关如何避免这些陷阱以获得更可靠的性能测量以及在不同设计点之间进行更彻底和更公平的比较的指南。

更新日期：2020-06-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>