Pushing the Cloud Limits in Support of IceCube Science,IEEE Internet Computing

当前位置： X-MOL 学术 › IEEE Internet Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Pushing the Cloud Limits in Support of IceCube Science
IEEE Internet Computing ( IF 3.2 ) Pub Date : 2021-02-19 , DOI: 10.1109/mic.2020.3045209
Igor Sfiligoi ₁ , David Schultz ₂ , Frank Wurthwein ₃ , Benedikt Riedel ₄

Affiliation

Scientific high throughput computing needs are growing dramatically with time and public Clouds have become an attractive option for occasional bursts, due to their ability to be provisioned with minimal advance notice. The available capacity of both compute and networking is however not well understood. This article presents the results of several production runs of the IceCube collaboration that temporarily expanded its batch system environment with GPU-providing compute instances from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure, and the Google Cloud Platform. The aim of these Cloud bursts was to push the limits of Cloud compute, with a particular emphasis on GPU-providing instances. On the compute side, we showed that it is possible to reach peaks of over 380 fp32 PFLOPS using all available GPU-providing instance types and integrate over 1 fp32 EFLOP hour in a single workday by using only the most cost-effective ones. On the network side, we showed intra-Cloud network throughputs of over 1 Tbps, and 100 Gbps throughputs toward on-prem storage both using shared peering arrangements and dedicated network links.

中文翻译：

推动云极限以支持IceCube Science

随着时间的推移，对科学高吞吐量计算的需求急剧增长，并且由于能够以最少的提前通知进行配置，公有云已成为偶尔突发的有吸引力的选择。但是，人们对计算和网络的可用容量知之甚少。本文介绍了IceCube协作的几次生产运行的结果，这些协作通过使用来自三个主要云提供商（即Amazon Web Services，Microsoft Azure和Google Cloud Platform）提供GPU的计算实例临时扩展了其批处理系统环境。这些云爆发的目的是推动云计算的极限，特别强调提供GPU的实例。在计算方面，我们证明，使用所有可用的提供GPU的实例类型，有可能达到380 fp32 PFLOPS以上的峰值，并且仅使用最具成本效益的实例就可以在一个工作日内集成超过1 fp32 EFLOP小时。在网络方面，我们显示了使用共享对等安排和专用网络链接的内部云网络吞吐量超过1 Tbps，朝向内部存储的吞吐量为100 Gbps。

更新日期：2021-02-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>