当前位置: X-MOL 学术arXiv.cs.IT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Coded Distributed Computing with Partial Recovery
arXiv - CS - Information Theory Pub Date : 2020-07-04 , DOI: arxiv-2007.02191
Emre Ozfatura and Sennur Ulukus and Deniz Gunduz

Coded computation techniques provide robustness against straggling workers in distributed computing. However, most of the existing schemes require exact provisioning of the straggling behaviour and ignore the computations carried out by straggling workers. Moreover, these schemes are typically designed to recover the desired computation results accurately, while in many machine learning and iterative optimization algorithms, faster approximate solutions are known to result in an improvement in the overall convergence time. In this paper, we first introduce a novel coded matrix-vector multiplication scheme, called coded computation with partial recovery (CCPR), which benefits from the advantages of both coded and uncoded computation schemes, and reduces both the computation time and the decoding complexity by allowing a trade-off between the accuracy and the speed of computation. We then extend this approach to distributed implementation of more general computation tasks by proposing a coded communication scheme with partial recovery, where the results of subtasks computed by the workers are coded before being communicated. Numerical simulations on a large linear regression task confirm the benefits of the proposed distributed computation scheme with partial recovery in terms of the trade-off between the computation accuracy and latency.

中文翻译:

带部分恢复的编码分布式计算

编码计算技术提供了对抗分布式计算中分散的工人的鲁棒性。然而,现有的大多数方案都需要精确提供散列行为,而忽略散列工人进行的计算。此外,这些方案通常旨在准确地恢复所需的计算结果,而在许多机器学习和迭代优化算法中,已知更快的近似解会导致整体收敛时间的改善。在本文中,我们首先介绍了一种新的编码矩阵向量乘法方案,称为部分恢复编码计算(CCPR),它受益于编码和未编码计算方案的优点,并且通过允许在计算的准确性和速度之间进行权衡来减少计算时间和解码复杂度。然后,我们通过提出一种带有部分恢复的编码通信方案,将这种方法扩展到更通用计算任务的分布式实现,其中工作人员计算的子任务的结果在进行通信之前被编码。大型线性回归任务的数值模拟证实了所提出的分布式计算方案在计算精度和延迟之间的权衡方面具有部分恢复的好处。工作人员计算的子任务的结果在进行通信之前被编码。大型线性回归任务的数值模拟证实了所提出的分布式计算方案在计算精度和延迟之间的权衡方面具有部分恢复的好处。工作人员计算的子任务的结果在进行通信之前被编码。大型线性回归任务的数值模拟证实了所提出的分布式计算方案在计算精度和延迟之间的权衡方面具有部分恢复的好处。
更新日期:2020-07-07
down
wechat
bug