当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Real-Time LSM-Trees for HTAP Workloads
arXiv - CS - Databases Pub Date : 2021-01-17 , DOI: arxiv-2101.06801
Hemant Saxena, Lukasz Golab, Stratos Idreos, Ihab F. Ilyas

Real-time data analytics systems such as SAP HANA, MemSQL, and IBM Wildfire employ hybrid data layouts, in which data are stored in different formats throughout their lifecycle. Recent data are stored in a row-oriented format to serve OLTP workloads and support high data rates, while older data are transformed to a column-oriented format for OLAP access patterns. We observe that a Log-Structured Merge (LSM) Tree is a natural fit for a lifecycle-aware storage engine due to its high write throughput and level-oriented structure, in which records propagate from one level to the next over time. To build a lifecycle-aware storage engine using an LSM-Tree, we make a crucial modification to allow different data layouts in different levels, ranging from purely row-oriented to purely column-oriented, leading to a Real-Time LSM-Tree. We give a cost model and an algorithm to design a Real-Time LSM-Tree that is suitable for a given workload, followed by an experimental evaluation of LASER - a prototype implementation of our idea built on top of the RocksDB key-value store. In our evaluation, LASER is almost 5x faster than Postgres (a pure row-store) and two orders of magnitude faster than MonetDB (a pure column-store) for real-time data analytics workloads.

中文翻译:

用于HTAP工作负载的实时LSM树

诸如SAP HANA,MemSQL和IBM Wildfire之类的实时数据分析系统采用混合数据布局,其中数据在其整个生命周期中以不同的格式存储。最近的数据以面向行的格式存储,以服务OLTP工作负载并支持高数据速率,而较旧的数据则转换为面向列的格式以用于OLAP访问模式。我们观察到,日志结构合并树(LSM)因其高写入吞吐量和面向级别的结构而非常适合生命周期感知的存储引擎,其中记录随时间从一个级别传播到下一个级别。为了使用LSM-Tree构建可感知生命周期的存储引擎,我们进行了重要的修改,以允许在不同级别(从纯粹的面向行到纯粹的面向列)的不同数据布局,从而形成实时LSM-Tree。我们给出了一个成本模型和一种算法,以设计适合于给定工作负载的实时LSM-Tree,然后进行LASER的实验评估-LASER的原型实现基于RocksDB键值存储构建。在我们的评估中,对于实时数据分析工作负载,LASER比Postgres(纯行存储)快近5倍,比MonetDB(纯列存储)快两个数量级。
更新日期:2021-01-19
down
wechat
bug