当前位置: X-MOL 学术Earth Sci. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PODPAC: open-source Python software for enabling harmonized, plug-and-play processing of disparate earth observation data sets and seamless transition onto the serverless cloud by earth scientists
Earth Science Informatics ( IF 2.8 ) Pub Date : 2020-08-28 , DOI: 10.1007/s12145-020-00506-0
Mattheus P. Ueckermann , Jerry Bieszczad , Dara Entekhabi , Marc L. Shapiro , David R. Callendar , David Sullivan , Jeffrey Milloy

In this paper, we present the Pipeline for Observational Data Processing, Analysis, and Collaboration (PODPAC) software. PODPAC is an open-source Python library designed to enable widespread exploitation of NASA earth science data by enabling multi-scale and multi-windowed access, exploration, and integration of available earth science datasets to support analysis and analytics; automatic accounting for geospatial data formats, projections, and resolutions; simplified implementation and parallelization of geospatial data processing routines; standardized sharing of data and algorithms; and seamless transition of algorithms and data products from local development to distributed, serverless processing on commercial cloud computing environments. We describe the key elements of PODPAC’s architecture, including Nodes for unified encapsulation of disparate scientific data sources; Algorithms for plug-and-play processing and harmonization of multiple data source Nodes; and Lambda functions for serverless execution and sharing of new data products via the cloud. We provide an overview of our open-source code implementation and testing process for development and deployment of PODPAC. We describe our interactive, JupyterLab-based end-user documentation including quick-start examples and detailed use case studies. We conclude with examples of PODPAC’s application to: encapsulate data sources available on Amazon Web Services (AWS) Open Data repository; harmonize processing of multiple earth science data sets for downscaling of NASA Soil Moisture Active Passive (SMAP) soil moisture data; and deploy a serverless SMAP-based drought monitoring application for use access from mobile devices. We postulate that PODPAC will also be an effective tool for wrangling and standardizing massive earth science data sets for use in model training for machine learning applications.



中文翻译:

PODPAC:开源Python软件,可实现对不同地球观测数据集的统一,即插即用处理,并由地球科学家无缝过渡到无服务器云

在本文中,我们介绍了观测数据处理,分析和协作管道(PODPAC)软件。PODPAC是一个开放源代码Python库,旨在通过允许对现有地球科学数据集进行多尺度和多窗口访问,探索和集成,以支持分析和分析,从而广泛利用NASA地球科学数据;自动核算地理空间数据格式,投影和分辨率;简化了地理空间数据处理例程的实现和并行化;数据和算法的标准化共享;以及算法和数据产品从本地开发到商业云计算环境上的分布式无服务器处理的无缝过渡。我们描述了PODPAC体系结构的关键要素,包括节点统一封装不同的科学数据源;多数据源节点的即插即用处理和协调算法;和Lambda函数通过云实现无服务器执行和共享新数据产品。我们概述了用于PODPAC开发和部署的开源代码实现和测试过程。我们描述了基于JupyterLab的交互式最终用户文档,包括快速入门示例和详细的用例研究。我们以PODPAC的应用程序示例结束:封装Amazon Web Services(AWS)开放数据存储库上可用的数据源;协调多个地球科学数据集的处理,以缩小NASA土壤水分主动被动(SMAP)土壤水分数据的比例;并部署基于服务器的无SMAP干旱监测应用程序,以从移动设备进行使用访问。

更新日期:2020-08-28
down
wechat
bug