DistriPlan: an optimized join execution framework for geo-distributed scientific data,Distributed and Parallel Databases

当前位置： X-MOL 学术 › Distrib. Parallel. Databases › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DistriPlan: an optimized join execution framework for geo-distributed scientific data
Distributed and Parallel Databases ( IF 1.2 ) Pub Date : 2019-03-23 , DOI: 10.1007/s10619-019-07264-z
Roee Ebenstein , Gagan Agrawal

Scientific data is frequently stored across geographically distributed data repositories. Although there have been recent efforts to query scientific datasets using structured query operators, they have not yet supported joins across distributed data repositories. This paper describes a framework that supports join-like operations over multi-dimensional array datasets that are spread across multiple sites. More specifically, we first formally define join operations over array datasets and establish how they arise in the context of scientific data analysis. We then describe a methodology for optimizing such operations—components of our approach include enumeration algorithms for candidate plans, methods for pruning plans before they are enumerated, and a detailed cost model for selecting the best (cheapest) plan. We evaluate our approach using candidate queries, and show that the optimization effort is practical and profitable—query performance was improved significantly using our approach.

中文翻译：

DistriPlan：地理分布式科学数据的优化连接执行框架

科学数据经常存储在地理分布的数据存储库中。尽管最近努力使用结构化查询运算符查询科学数据集，但它们尚不支持跨分布式数据存储库的连接。本文描述了一个框架，该框架支持对分布在多个站点的多维数组数据集进行类似连接的操作。更具体地说，我们首先正式定义数组数据集上的连接操作，并确定它们在科学数据分析的上下文中是如何出现的。然后，我们描述了一种优化此类操作的方法——我们方法的组成部分包括候选计划的枚举算法、在枚举之前修剪计划的方法以及用于选择最佳（最便宜）计划的详细成本模型。

更新日期：2019-03-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>