On Gradient Coding with Partial Recovery,arXiv - CS - Distributed, Parallel, and Cluster Computing

当前位置： X-MOL 学术 › arXiv.cs.DC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On Gradient Coding with Partial Recovery
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2021-02-19 , DOI: arxiv-2102.10163
Sahasrajit Sarmasarkar, V. Lalitha, Nikhil Karamchandani

We consider a generalization of the recently proposed gradient coding framework where a large dataset is divided across $n$ workers and each worker transmits to a master node one or more linear combinations of the gradients over the data subsets assigned to it. Unlike the conventional framework which requires the master node to recover the sum of the gradients over all the data subsets in the presence of $s$ straggler workers, we relax the goal of the master node to computing the sum of at least some $\alpha$ fraction of the gradients. The broad goal of our work is to study the optimal computation and communication load per worker for this approximate gradient coding framework. We begin by deriving a lower bound on the computation load of any feasible scheme and also propose a strategy which achieves this lower bound, albeit at the cost of high communication load and a number of data partitions which can be polynomial in the number of workers $n$. We then restrict attention to schemes which utilize a number of data partitions equal to $n$ and propose schemes based on cyclic assignment which have a lower communication load. When each worker transmits a single linear combination, we also prove lower bounds on the computation load of any scheme using $n$ data partitions.

中文翻译：

具有部分恢复的梯度编码

我们考虑了最近提出的梯度编码框架的一般化，其中将大型数据集划分为$ n $个工作人员，每个工作人员将一个或多个梯度的线性组合在分配给它的数据子集上传输到主节点。与常规框架不同，常规框架要求主节点在存在$ s $散布工人的情况下恢复所有数据子集上的梯度总和，我们放宽了主节点计算至少一些$ \ alpha的总和的目标。 $渐变的分数。我们工作的主要目标是针对这种近似梯度编码框架研究每个工人的最佳计算和通信负载。我们首先得出任何可行方案的计算负荷的下限，然后提出一种实现该下限的策略，尽管是以高通信负载和许多数据分区为代价的，但这些数据分区可能是工人数量n的多项式。然后，我们将注意力集中在利用等于$ n $的多个数据分区的方案上，并提出基于循环分配的方案，该方案具有较低的通信负载。当每个工作者传输单个线性组合时，我们还证明了使用$ n $数据分区的任何方案的计算负担的下限。

更新日期：2021-02-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>