当前位置: X-MOL 学术arXiv.cs.IT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Serverless Straggler Mitigation using Local Error-Correcting Codes
arXiv - CS - Information Theory Pub Date : 2020-01-21 , DOI: arxiv-2001.07490
Vipul Gupta, Dominic Carrano, Yaoqing Yang, Vaishaal Shankar, Thomas Courtade, Kannan Ramchandran

Inexpensive cloud services, such as serverless computing, are often vulnerable to straggling nodes that increase end-to-end latency for distributed computation. We propose and implement simple yet principled approaches for straggler mitigation in serverless systems for matrix multiplication and evaluate them on several common applications from machine learning and high-performance computing. The proposed schemes are inspired by error-correcting codes and employ parallel encoding and decoding over the data stored in the cloud using serverless workers. This creates a fully distributed computing framework without using a master node to conduct encoding or decoding, which removes the computation, communication and storage bottleneck at the master. On the theory side, we establish that our proposed scheme is asymptotically optimal in terms of decoding time and provide a lower bound on the number of stragglers it can tolerate with high probability. Through extensive experiments, we show that our scheme outperforms existing schemes such as speculative execution and other coding theoretic methods by at least 25%.

中文翻译:

使用本地纠错码的无服务器落后者缓解

廉价的云服务,例如无服务器计算,通常容易受到分散节点的影响,这些节点会增加分布式计算的端到端延迟。我们提出并实施了简单但有原则的方法,用于矩阵乘法的无服务器系统中的落后者缓解,并在机器学习和高性能计算的几种常见应用程序上对其进行评估。所提出的方案受到纠错码的启发,并使用无服务器工作者对存储在云中的数据进行并行编码和解码。这创建了一个完全分布式的计算框架,无需使用主节点进行编码或解码,从而消除了主节点的计算、通信和存储瓶颈。在理论方面,我们确定我们提出的方案在解码时间方面是渐近最优的,并提供了它可以容忍的高概率掉队者数量的下限。通过广泛的实验,我们表明我们的方案比现有方案(例如推测执行和其他编码理论方法)的性能至少高出 25%。
更新日期:2020-01-22
down
wechat
bug