Berrut Approximated Coded Computing: Straggler Resistance Beyond Polynomial Computing,arXiv - CS - Distributed, Parallel, and Cluster Computing

当前位置： X-MOL 学术 › arXiv.cs.DC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Berrut Approximated Coded Computing: Straggler Resistance Beyond Polynomial Computing
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2020-09-17 , DOI: arxiv-2009.08327
Tayyebeh Jahani-Nezhad, Mohammad Ali Maddah-Ali

One of the major challenges in using distributed learning to train complicated models with large data sets is to deal with stragglers effect. As a solution, coded computation has been recently proposed to efficiently add redundancy to the computation tasks. In this technique, coding is used across data sets, and computation is done over coded data, such that the results of an arbitrary subset of worker nodes with a certain size are enough to recover the final results. The major challenges with those approaches are (1) they are limited to polynomial function computations, (2) the size of the subset of servers that we need to wait for grows with the multiplication of the size of the data set and the model complexity (the degree of the polynomial), which can be prohibitively large, (3) they are not numerically stable for computation over real numbers. In this paper, we propose Berrut Approximated Coded Computing (BACC), as an alternative approach, which is not limited to polynomial function computation. In addition, the master node can approximately calculate the final results, using the outcomes of any arbitrary subset of available worker nodes. The approximation approach is proven to be numerically stable with low computational complexity. In addition, the accuracy of the approximation is established theoretically and verified by simulation results in different settings such as distributed learning problems. In particular, BACC is used to train a deep neural network on a cluster of servers, which outperforms repetitive computation (repetition coding) in terms of the rate of convergence.

中文翻译：

Berrut 近似编码计算：超越多项式计算的落后者阻力

使用分布式学习训练具有大数据集的复杂模型的主要挑战之一是处理掉队者效应。作为一种解决方案，最近提出了编码计算以有效地向计算任务添加冗余。在该技术中，跨数据集使用编码，并在编码数据上完成计算，这样具有一定大小的工作节点的任意子集的结果足以恢复最终结果。这些方法的主要挑战是（1）它们仅限于多项式函数计算，（2）我们需要等待的服务器子集的大小随着数据集大小和模型复杂性的增加而增长（多项式的次数），它可能非常大，(3) 它们在计算实数时在数值上不稳定。在本文中，我们提出了 Berrut 近似编码计算 (BACC)，作为一种替代方法，它不仅限于多项式函数计算。此外，主节点可以使用可用工作节点的任意子集的结果来近似计算最终结果。该近似方法被证明在数值上是稳定的，计算复杂度低。此外，近似的准确性是从理论上建立的，并通过在分布式学习问题等不同设置下的仿真结果进行验证。特别是，BACC 用于在服务器集群上训练深度神经网络，其在收敛速度方面优于重复计算（重复编码）。我们提出了 Berrut 近似编码计算 (BACC)，作为一种替代方法，它不仅限于多项式函数计算。此外，主节点可以使用可用工作节点的任意子集的结果来近似计算最终结果。该近似方法被证明在数值上是稳定的，计算复杂度低。此外，近似的准确性是从理论上建立的，并通过在分布式学习问题等不同设置下的仿真结果进行验证。特别是，BACC 用于在服务器集群上训练深度神经网络，其在收敛速度方面优于重复计算（重复编码）。我们提出了 Berrut 近似编码计算 (BACC)，作为一种替代方法，它不仅限于多项式函数计算。此外，主节点可以使用可用工作节点的任意子集的结果来近似计算最终结果。该近似方法被证明在数值上是稳定的，计算复杂度低。此外，近似的准确性是从理论上建立的，并通过在分布式学习问题等不同设置下的仿真结果进行验证。特别是，BACC 用于在服务器集群上训练深度神经网络，其在收敛速度方面优于重复计算（重复编码）。此外，主节点可以使用可用工作节点的任意子集的结果来近似计算最终结果。该近似方法被证明在数值上是稳定的，计算复杂度低。此外，近似的准确性是从理论上建立的，并通过在分布式学习问题等不同设置下的仿真结果进行验证。特别是，BACC 用于在服务器集群上训练深度神经网络，其在收敛速度方面优于重复计算（重复编码）。此外，主节点可以使用可用工作节点的任意子集的结果来近似计算最终结果。该近似方法被证明在数值上是稳定的，计算复杂度低。此外，近似的准确性是从理论上建立的，并通过在分布式学习问题等不同设置下的仿真结果进行验证。特别是，BACC 用于在服务器集群上训练深度神经网络，其在收敛速度方面优于重复计算（重复编码）。近似的准确性是从理论上建立的，并通过在分布式学习问题等不同设置下的仿真结果进行验证。特别是，BACC 用于在服务器集群上训练深度神经网络，其在收敛速度方面优于重复计算（重复编码）。近似的准确性是从理论上建立的，并通过在分布式学习问题等不同设置下的仿真结果进行验证。特别是，BACC 用于在服务器集群上训练深度神经网络，其在收敛速度方面优于重复计算（重复编码）。

更新日期：2020-09-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文