当前位置: X-MOL 学术Comput. Sci. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cloud-Native Repositories for Big Scientific Data
Computing in Science & Engineering ( IF 1.8 ) Pub Date : 2021-02-15 , DOI: 10.1109/mcse.2021.3059437
Ryan P. Abernathey 1 , Tom Augspurger 2 , Anderson Banihirwe 3 , Charles C. Blackmon-Luca 1 , Timothy J. Crone 1 , Chelle L. Gentemann 4 , Joseph J. Hamman 3 , Naomi Henderson 1 , Chiara Lepore 1 , Theo A. McCaie 5 , Niall H. Robinson 5 , Richard P. Signell 6
Affiliation  

Scientific data have traditionally been distributed via downloads from data server to local computer. This way of working suffers from limitations as scientific datasets grow toward the petabyte scale. A “cloud-native data repository,” as defined in this article, offers several advantages over traditional data repositories—performance, reliability, cost-effectiveness, collaboration, reproducibility, creativity, downstream impacts, and access and inclusion. These objectives motivate a set of best practices for cloud-native data repositories: analysis-ready data, cloud-optimized (ARCO) formats, and loose coupling with data-proximate computing. The Pangeo Project has developed a prototype implementation of these principles by using open-source scientific Python tools. By providing an ARCO data catalog together with on-demand, scalable distributed computing, Pangeo enables users to process big data at rates exceeding 10 GB/s. Several challenges must be resolved in order to realize cloud computing’s full potential for scientific research, such as organizing funding, training users, and enforcing data privacy requirements.

中文翻译:

大型科学数据的云原生存储库

传统上,科学数据是通过从数据服务器下载到本地计算机来分发的。随着科学数据集向PB级发展,这种工作方式受到了限制。本文定义的“云原生数据存储库”相对于传统数据存储库具有多个优势-性能,可靠性,成本效益,协作,可再现性,创造力,下游影响以及访问和包含。这些目标激发了一组针对云原生数据存储库的最佳实践:分析就绪数据,云优化(ARCO)格式以及与数据邻近计算的松散耦合。Pangeo项目通过使用开源科学Python工具开发了这些原理的原型实现。通过随需提供ARCO数据目录,可扩展的分布式计算,Pangeo使用户能够以超过10 GB / s的速率处理大数据。为了实现云计算在科学研究中的全部潜力,必须解决几个挑战,例如组织资金,培训用户和执行数据隐私要求。
更新日期:2021-03-30
down
wechat
bug