Algorithms for a Topology-aware Massively Parallel Computation Model,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Algorithms for a Topology-aware Massively Parallel Computation Model
arXiv - CS - Databases Pub Date : 2020-09-24 , DOI: arxiv-2009.11463
Xiao Hu, Paraschos Koutris, Spyros Blanas

Most of the prior work in massively parallel data processing assumes homogeneity, i.e., every computing unit has the same computational capability, and can communicate with every other unit with the same latency and bandwidth. However, this strong assumption of a uniform topology rarely holds in practical settings, where computing units are connected through complex networks. To address this issue, Blanas et al. recently proposed a topology-aware massively parallel computation model that integrates the network structure and heterogeneity in the modeling cost. The network is modeled as a directed graph, where each edge is associated with a cost function that depends on the data transferred between the two endpoints. The computation proceeds in synchronous rounds, and the cost of each round is measured as the maximum cost over all the edges in the network. In this work, we take the first step into investigating three fundamental data processing tasks in this topology-aware parallel model: set intersection, cartesian product, and sorting. We focus on network topologies that are tree topologies, and present both lower bounds, as well as (asymptotically) matching upper bounds. The optimality of our algorithms is with respect to the initial data distribution among the network nodes, instead of assuming worst-case distribution as in previous results. Apart from the theoretical optimality of our results, our protocols are simple, use a constant number of rounds, and we believe can be implemented in practical settings as well.

中文翻译：

拓扑感知大规模并行计算模型的算法

大规模并行数据处理中的大多数先前工作都假设同质性，即每个计算单元具有相同的计算能力，并且可以以相同的延迟和带宽与每个其他单元进行通信。然而，这种统一拓扑的强假设在计算单元通过复杂网络连接的实际环境中很少适用。为了解决这个问题，Blanas 等人。最近提出了一种拓扑感知的大规模并行计算模型，该模型在建模成本中整合了网络结构和异构性。该网络被建模为有向图，其中每条边都与一个成本函数相关联，该成本函数取决于两个端点之间传输的数据。计算以同步轮进行，每轮的成本被衡量为网络中所有边的最大成本。在这项工作中，我们首先研究了这个拓扑感知并行模型中的三个基本数据处理任务：集合交集、笛卡尔积和排序。我们专注于作为树拓扑的网络拓扑，并提供下限以及（渐近）匹配的上限。我们算法的最优性是关于网络节点之间的初始数据分布，而不是像以前的结果那样假设最坏情况的分布。除了我们结果的理论最优性之外，我们的协议很简单，使用恒定的轮数，我们相信也可以在实际环境中实施。我们首先研究了这个拓扑感知并行模型中的三个基本数据处理任务：集合交集、笛卡尔积和排序。我们专注于作为树拓扑的网络拓扑，并提供下限以及（渐近）匹配的上限。我们算法的最优性是关于网络节点之间的初始数据分布，而不是像以前的结果那样假设最坏情况的分布。除了我们结果的理论最优性之外，我们的协议很简单，使用恒定的轮数，我们相信也可以在实际环境中实施。我们首先研究了这个拓扑感知并行模型中的三个基本数据处理任务：集合交集、笛卡尔积和排序。我们专注于作为树拓扑的网络拓扑，并提供下限以及（渐近）匹配的上限。我们算法的最优性是关于网络节点之间的初始数据分布，而不是像以前的结果那样假设最坏情况的分布。除了我们结果的理论最优性之外，我们的协议很简单，使用恒定的轮数，我们相信也可以在实际环境中实施。以及（渐近）匹配上限。我们算法的最优性是关于网络节点之间的初始数据分布，而不是像以前的结果那样假设最坏情况的分布。除了我们结果的理论最优性之外，我们的协议很简单，使用恒定的轮数，我们相信也可以在实际环境中实施。以及（渐近）匹配上限。我们算法的最优性是关于网络节点之间的初始数据分布，而不是像以前的结果那样假设最坏情况的分布。除了我们结果的理论最优性之外，我们的协议很简单，使用恒定的轮数，我们相信也可以在实际环境中实施。

更新日期：2020-09-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文