当前位置: X-MOL 学术Distrib. Parallel. Databases › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distributed rrays: an algebra for generic distributed query processing
Distributed and Parallel Databases ( IF 1.5 ) Pub Date : 2021-04-05 , DOI: 10.1007/s10619-021-07325-2
Ralf Hartmut Güting , Thomas Behr , Jan Kristof Nidzwetzki

We propose a simple model for distributed query processing based on the concept of a distributed array. Such an array has fields of some data type whose values can be stored on different machines. It offers operations to manipulate all fields in parallel within the distributed algebra. The arrays considered are one-dimensional and just serve to model a partitioned and distributed data set. Distributed arrays rest on a given set of data types and operations called the basic algebra implemented by some piece of software called the basic engine. It provides a complete environment for query processing on a single machine. We assume this environment is extensible by types and operations. Operations on distributed arrays are implemented by one basic engine called the master which controls a set of basic engines called the workers. It maps operations on distributed arrays to the respective operations on their fields executed by workers. The distributed algebra is completely generic: any type or operation added in the extensible basic engine will be immediately available for distributed query processing. To demonstrate the use of the distributed algebra as a language for distributed query processing, we describe a fairly complex algorithm for distributed density-based similarity clustering. The algorithm is a novel contribution by itself. Its complete implementation is shown in terms of the distributed algebra and the basic algebra. As a basic engine the Secondo system is used, a rich environment for extensible query processing, providing useful tools such as main memory M-trees, graphs, or a DBScan implementation.



中文翻译:

分布式Rray:通用分布式查询处理的代数

我们基于分布式数组的概念提出了一种简单的分布式查询处理模型。这样的数组具有某些数据类型的字段,其值可以存储在不同的机器上。它提供了在分布式代数中并行操作所有场的操作。所考虑的数组是一维的,仅用于对分区和分布式数据集进行建模。分布式数组基于给定的一组数据类型和操作,这些数据类型和操作称为基本代数,该代数由某些称为基本引擎的软件实现。它为单台计算机上的查询处理提供了一个完整的环境。我们假设此环境可以通过类型和操作进行扩展。分布式阵列上的操作由一个称为主服务器的基本引擎实现,该引擎控制一组称为工作程序的基本引擎。它将分布式数组上的操作映射到工作程序在其字段上执行的相应操作。分布式代数是完全通用的:可扩展的基本引擎中添加的任何类型或操作都将立即可用于分布式查询处理。为了演示使用分布式代数作为一种用于分布式查询处理的语言,我们描述了一种基于分布式密度的相似性聚类的相当复杂的算法。该算法本身就是一个新颖的贡献。根据分布式代数和基本代数显示了它的完整实现。作为基本引擎,使用了Secondo系统,它是用于可扩展查询处理的丰富环境,提供了有用的工具,例如主内存M树,图形或DBScan实现。

更新日期:2021-04-06
down
wechat
bug