当前位置: X-MOL 学术Comput. Geosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
HPC cluster-based user-defined data integration platform for deep learning in geoscience applications
Computers & Geosciences ( IF 4.4 ) Pub Date : 2021-06-21 , DOI: 10.1016/j.cageo.2021.104868
Guohua Li , Yeji Choi

Objective:

Efficient and flexible data integration platforms are important to apply various geoscience data for deep learning applications. Recently, deep learning techniques have been applied to analyze and predict natural phenomena in Earth sciences using geoscience data. Because geoscience data are considered as “big data”, data driven approaches such as deep learning are promising tools for understanding natural phenomena. In this paper, we propose a Geoscience Data Integration Platform (GeoDIP) for managing big geoscience data based on High-performance Computing (HPC) cluster systems.

Methodology:

GeoDIP provides data pre-processing and analyzes modules according to user defined configurations when creating Artificial Intelligence (AI) ready datasets. To determine the application of GeoDIP, we demonstrated precipitation prediction performance using multiple datasets. We collected three datasets from different sources: two from satellite observations and one from reanalysis data for weather forecasting. We then compared the results obtained for each dataset and the results obtained for integrated datasets.

Results:

The results confirmed that the integrated dataset generated from GeoDIP provided 9% improved prediction performance for F1-score over 2 h. This suggests that various types of information on atmospheric conditions explaining precipitation genesis from multiple data sources are crucial for precipitation prediction. For understanding model performance, we conducted a permutation-based feature importance test, which confirmed that the upper level information is important over time. In addition, we evaluated the performance of GeoDIP by comparing sequential and parallel tasks and obtained a performance improvement of approximately 97%.

Conclusions:

The proposed geoDIP facilitates the utilization of multiple datasets to analyze geophysical phenomena using parallel processes of HPC clusters with reduced computational time for data pre-processing and analysis.



中文翻译:

基于 HPC 集群的用户定义数据集成平台,用于地球科学应用中的深度学习

客观的:

高效灵活的数据集成平台对于将各种地球科学数据应用于深度学习应用非常重要。最近,深度学习技术已被应用于使用地球科学数据分析和预测地球科学中的自然现象。由于地球科学数据被视为“大数据”,因此深度学习等数据驱动方法是理解自然现象的有前途的工具。在本文中,我们提出了一个地球科学数据集成平台(GeoDIP),用于管理基于高性能计算(HPC)集群系统的大地球科学数据。

方法:

GeoDIP 在创建人工智能 (AI) 就绪数据集时,根据用户定义的配置提供数据预处理和分析模块。为了确定 GeoDIP 的应用,我们展示了使用多个数据集的降水预测性能。我们从不同来源收集了三个数据集:两个来自卫星观测,一个来自用于天气预报的再分析数据。然后,我们比较了每个数据集获得的结果和集成数据集获得的结果。

结果:

结果证实,从 GeoDIP 生成的集成数据集在 2 小时内为 F1 分数提供了 9% 的预测性能。这表明来自多个数据源的解释降水成因的各种类型的大气条件信息对于降水预测至关重要。为了理解模型性能,我们进行了基于排列的特征重要性测试,这证实了上层信息随着时间的推移很重要。此外,我们通过比较顺序和并行任务来评估 GeoDIP 的性能,并获得了大约 97% 的性能提升。

结论:

拟议的 geoDIP 有助于利用多个数据集来分析地球物理现象,使用 HPC 集群的并行过程,减少数据预处理和分析的计算时间。

更新日期:2021-06-22
down
wechat
bug