Optimal Load Allocation for Coded Distributed Computation in Heterogeneous Clusters,IEEE Transactions on Communications

当前位置： X-MOL 学术 › IEEE Trans. Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Optimal Load Allocation for Coded Distributed Computation in Heterogeneous Clusters
IEEE Transactions on Communications ( IF 8.3 ) Pub Date : 2021-01-01 , DOI: 10.1109/tcomm.2020.3030667
Daejin Kim , Hyegyeong Park , Jun Kyun Choi

Recently, coding has been a useful technique to mitigate stragglers’ effect in distributed computing. However, coding in this context has been mainly explored assuming homogeneous workers, although real-world clusters often consist of heterogeneous workers with different computing capabilities. The uniform load allocation without considering the heterogeneity possibly causes a significant loss in latency. In this article, we suggest the optimal load allocation for coded distributed computing with heterogeneous workers. Specifically, we focus on the scenario that there exist workers having the same computing capability, which can be regarded as a group for analysis. We rely on the lower bound on the expected latency and obtain the optimal load allocation by showing that our load allocation achieves the minimum of the lower bound for a sufficiently large number of workers. Given the proposed optimal load allocation, we derive the optimal code rate to achieve the minimum expected latency. From numerical simulations, when assuming the group heterogeneity, our load allocation reduces the expected latency by orders of magnitude over the existing scheme. Furthermore, from experiments on Amazon EC2 for scenarios with distinct straggler/heterogeneity patterns, we observe that our scheme outperforms the competing schemes reducing the total finishing time by up to 52%.

中文翻译：

异构集群中编码分布式计算的最优负载分配

最近，编码已成为减轻分布式计算中落后者效应的有用技术。然而，尽管现实世界的集群通常由具有不同计算能力的异构工人组成，但在这种情况下的编码主要是在假设同质工人的情况下进行的。不考虑异构性的均匀负载分配可能会导致延迟的显着损失。在本文中，我们建议使用异构工作者进行编码分布式计算的最佳负载分配。具体而言，我们关注存在具有相同计算能力的工人的场景，可以将其视为一个组进行分析。我们依赖于预期延迟的下限，并通过展示我们的负载分配达到足够多的工人的最低下限来获得最佳负载分配。给定建议的最佳负载分配，我们推导出最佳码率以实现最小预期延迟。从数值模拟来看，当假设组异质性时，我们的负载分配比现有方案减少了几个数量级的预期延迟。此外，从 Amazon EC2 上针对具有明显落后者/异质性模式的场景的实验，我们观察到我们的方案优于竞争方案，将总完成时间减少了 52%。从数值模拟来看，当假设组异质性时，我们的负载分配比现有方案减少了几个数量级的预期延迟。此外，从 Amazon EC2 上针对具有明显落后者/异质性模式的场景的实验，我们观察到我们的方案优于竞争方案，将总完成时间减少了 52%。从数值模拟来看，当假设组异质性时，我们的负载分配比现有方案减少了几个数量级的预期延迟。此外，从 Amazon EC2 上针对具有明显落后者/异质性模式的场景的实验，我们观察到我们的方案优于竞争方案，将总完成时间减少了 52%。

更新日期：2021-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>