PLI$$^+$$+: efficient clustering of cloud databases,Distributed and Parallel Databases

当前位置： X-MOL 学术 › Distrib. Parallel. Databases › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

PLI$$^+$$+: efficient clustering of cloud databases
Distributed and Parallel Databases ( IF 1.5 ) Pub Date : 2018-10-06 , DOI: 10.1007/s10619-018-7252-2
Dai Hai Ton That , James Wagner , Alexander Rasin , Tanu Malik

Commercial cloud database services increase availability of data and provide reliable access to data. Routine database maintenance tasks such as clustering, however, increase the costs of hosting data on commercial cloud instances. Clustering causes an I/O burst; clustering in one-shot depletes I/O credit accumulated by an instance and increases the cost of hosting data. An unclustered database decreases query performance by scanning large amounts of data, gradually depleting I/O credits. In this paper, we introduce Physical Location Index Plus ($${PLI}^{\small {{+}}}$$PLI+), an indexing method for databases hosted on commercial cloud. $${PLI}^{\small {{+}}}$$PLI+ relies on internal knowledge of data layout, building a physical location index, which maps a range of physical co-locations with a range of attribute values to create approximately sorted buckets. As new data is inserted, writes are partitioned in memory based on incoming data distribution. The data is written to physical locations on disk in block-based partitions to favor large granularity I/O. Incoming SQL queries on indexed attribute values are rewritten in terms of the physical location ranges. As a result, $${PLI}^{\small {{+}}}$$PLI+ does not decrease query performance on an unclustered cloud database instance, DBAs may choose to cluster the instance when they have sufficiently large I/O credit available for clustering thus delaying the need for clustering. We evaluate query performance over $${PLI}^{\small {{+}}}$$PLI+ by comparing it with clustered, unclustered (secondary) indexes, and log-structured merge trees on real datasets. Experiments show that $${PLI}^{\small {{+}}}$$PLI+ significantly delays clustering, and yet does not degrade query performance—thus achieving higher level of sortedness than unclustered indexes and log-structured merge trees. We also evaluate the quality of clustering by introducing a measure of interval sortedness, and the size of index.

中文翻译：

PLI$$^+$$+：云数据库的高效集群

商业云数据库服务可提高数据的可用性并提供对数据的可靠访问。然而，集群等日常数据库维护任务会增加在商业云实例上托管数据的成本。集群导致 I/O 突发；一次性集群会耗尽实例积累的 I/O 信用并增加托管数据的成本。非集群数据库通过扫描大量数据来降低查询性能，逐渐耗尽 I/O 积分。在本文中，我们介绍了 Physical Location Index Plus ($${PLI}^{\small {{+}}}$$PLI+)，这是一种用于托管在商业云上的数据库的索引方法。$${PLI}^{\small {{+}}}$$PLI+ 依赖于数据布局的内部知识，构建物理位置索引，它将一系列物理并置位置与一系列属性值映射以创建近似排序的存储桶。随着新数据的插入，写入会根据传入的数据分布在内存中进行分区。数据被写入磁盘上基于块的分区中的物理位置，以支持大粒度 I/O。根据物理位置范围重写对索引属性值的传入 SQL 查询。因此，$${PLI}^{\small {{+}}}$$PLI+ 不会降低非集群云数据库实例的查询性能，当 DBA 有足够大的 I/O 信用时，他们可能会选择集群实例可用于聚类，从而延迟了聚类的需要。我们通过将 $${PLI}^{\small {{+}}}$$PLI+ 与聚集的、非聚集的（二级）索引进行比较来评估查询性能，和真实数据集上的日志结构合并树。实验表明，$${PLI}^{\small {{+}}}$$PLI+ 显着延迟了聚类，但不会降低查询性能——从而实现比非聚集索引和日志结构合并树更高级别的排序。我们还通过引入间隔排序的度量和索引的大小来评估聚类的质量。

更新日期：2018-10-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11