Fast Data-Obtaining Algorithm for Data Assimilation with Large Data Set,International Journal of Parallel Programming

当前位置： X-MOL 学术 › Int. J. Parallel. Program › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Fast Data-Obtaining Algorithm for Data Assimilation with Large Data Set
International Journal of Parallel Programming ( IF 0.9 ) Pub Date : 2019-12-06 , DOI: 10.1007/s10766-019-00653-y
Junmin Xiao , Guizhao Zhang , Yanan Gao , Xuehai Hong , Guangming Tan

Data assimilation is an analysis technique which combines observations and the numerical results from theoretical models to deduce more realistic and accurate data. It is widely used in investigations of the atmosphere, ocean and land surface. Due to the complicated data structure of the inputs from dynamical models and the increase of the amount of model data, the parallelization of data assimilation suffers from high overhead on file reading and data communication. In this paper, we propose a flexible parallel data access approach for reading a large number of data from disks firstly. Using this approach, the data access conflict is avoided successfully, and the frequency of disk addressing operations is also decreased significantly. Next, we design a communication-avoiding strategy to reduce the communication volume at the cost of some additional computations. Furthermore, we present a “pipe-flow” scheme for data exchange to conduct conflict-free message passing. Consequently, a fast data-obtaining algorithm is developed for the data assimilation. Our experiments show that the fast data-obtaining algorithm gains a performance of $$5\times $$ 5 × speedup compared with the baseline, which is excellent at data-obtaining for the parallel data assimilation. Due to the reduction of disk addressing operations, the new approach achieves $$6\times $$ 6 × speedup on average for the file reading process. Since a large amount of data movement can be avoided, the new approach achieves $$2.7\times $$ 2.7 × speedup on average for the communication between processors.

中文翻译：

大数据集数据同化的快速数据获取算法

数据同化是一种分析技术，它结合观测和理论模型的数值结果，推导出更真实、更准确的数据。它广泛用于大气、海洋和陆地表面的调查。由于动力学模型输入的数据结构复杂，模型数据量增加，数据同化并行化面临文件读取和数据通信的高开销。在本文中，我们首先提出了一种灵活的并行数据访问方法，用于从磁盘读取大量数据。使用这种方法，成功避免了数据访问冲突，也显着降低了磁盘寻址操作的频率。下一个，我们设计了一种避免通信的策略，以增加一些额外的计算为代价来减少通信量。此外，我们提出了一种用于数据交换的“管道流”方案，以进行无冲突的消息传递。因此，开发了一种用于数据同化的快速数据获取算法。我们的实验表明，与基线相比，快速数据获取算法获得了 $5\times $$ 5 × 加速比的性能，这在并行数据同化的数据获取方面非常出色。由于减少了磁盘寻址操作，新方法在文件读取过程中平均实现了 $$6\times $$6 × 加速。由于可以避免大量数据移动，新方法实现了处理器之间通信的平均 2.7 美元\乘以 2.7 美元的加速比。我们提出了一种用于数据交换的“管道流”方案，以进行无冲突的消息传递。因此，开发了一种用于数据同化的快速数据获取算法。我们的实验表明，与基线相比，快速数据获取算法获得了 $5\times $$ 5 × 加速比的性能，这在并行数据同化的数据获取方面非常出色。由于减少了磁盘寻址操作，新方法在文件读取过程中平均实现了 $$6\times $$6 × 加速。由于可以避免大量数据移动，新方法实现了 2.7 美元\乘以 2.7 美元×处理器之间通信的平均加速。我们提出了一种用于数据交换的“管道流”方案，以进行无冲突的消息传递。因此，开发了一种用于数据同化的快速数据获取算法。我们的实验表明，与基线相比，快速数据获取算法获得了 $5\times $$ 5 × 加速比的性能，这在并行数据同化的数据获取方面非常出色。由于减少了磁盘寻址操作，新方法在文件读取过程中平均实现了 $$6\times $$6 × 加速。由于可以避免大量数据移动，新方法实现了处理器之间通信的平均 2.7 美元\乘以 2.7 美元的加速比。为数据同化开发了一种快速数据获取算法。我们的实验表明，与基线相比，快速数据获取算法获得了 $5\times $$ 5 × 加速比的性能，这在并行数据同化的数据获取方面非常出色。由于减少了磁盘寻址操作，新方法在文件读取过程中平均实现了 $$6\times $$6 × 加速。由于可以避免大量数据移动，新方法实现了处理器之间通信的平均 2.7 美元\乘以 2.7 美元的加速比。为数据同化开发了一种快速数据获取算法。我们的实验表明，与基线相比，快速数据获取算法获得了 $5\times $$ 5 × 加速比的性能，这在并行数据同化的数据获取方面非常出色。由于减少了磁盘寻址操作，新方法在文件读取过程中平均实现了 $$6\times $$6 × 加速。由于可以避免大量数据移动，新方法实现了 2.7 美元\乘以 2.7 美元×处理器之间通信的平均加速。新方法为文件读取过程平均实现了 $6\times $$6 × 加速。由于可以避免大量数据移动，新方法实现了处理器之间通信的平均 2.7 美元\乘以 2.7 美元的加速比。新方法为文件读取过程平均实现了 $6\times $$6 × 加速。由于可以避免大量数据移动，新方法实现了处理器之间通信的平均 2.7 美元\乘以 2.7 美元的加速比。

更新日期：2019-12-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11