Relationship-aware Multivariate Sampling Strategy for Scientific Simulation Data,arXiv - CS - Graphics

当前位置： X-MOL 学术 › arXiv.cs.GR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Relationship-aware Multivariate Sampling Strategy for Scientific Simulation Data
arXiv - CS - Graphics Pub Date : 2020-08-31 , DOI: arxiv-2008.13306
Subhashis Hazarika, Ayan Biswas, Phillip J. Wolfram, Earl Lawrence, Nathan Urban

With the increasing computational power of current supercomputers, the size of data produced by scientific simulations is rapidly growing. To reduce the storage footprint and facilitate scalable post-hoc analyses of such scientific data sets, various data reduction/summarization methods have been proposed over the years. Different flavors of sampling algorithms exist to sample the high-resolution scientific data, while preserving important data properties required for subsequent analyses. However, most of these sampling algorithms are designed for univariate data and cater to post-hoc analyses of single variables. In this work, we propose a multivariate sampling strategy which preserves the original variable relationships and enables different multivariate analyses directly on the sampled data. Our proposed strategy utilizes principal component analysis to capture the variance of multivariate data and can be built on top of any existing state-of-the-art sampling algorithms for single variables. In addition, we also propose variants of different data partitioning schemes (regular and irregular) to efficiently model the local multivariate relationships. Using two real-world multivariate data sets, we demonstrate the efficacy of our proposed multivariate sampling strategy with respect to its data reduction capabilities as well as the ease of performing efficient post-hoc multivariate analyses.

中文翻译：

科学模拟数据的关系感知多元采样策略

随着当前超级计算机计算能力的不断增强，科学模拟产生的数据量正在迅速增长。为了减少存储占用空间并促进对此类科学数据集的可扩展事后分析，多年来提出了各种数据缩减/汇总方法。存在不同风格的采样算法来对高分辨率科学数据进行采样，同时保留后续分析所需的重要数据属性。然而，这些抽样算法中的大多数都是为单变量数据设计的，并迎合了单变量的事后分析。在这项工作中，我们提出了一种多元采样策略，它保留了原始变量的关系，并直接对采样数据进行了不同的多元分析。我们提出的策略利用主成分分析来捕获多变量数据的方差，并且可以建立在任何现有的最先进的单变量采样算法之上。此外，我们还提出了不同数据分区方案（规则和不规则）的变体，以有效地对局部多元关系进行建模。使用两个真实世界的多元数据集，我们证明了我们提出的多元抽样策略在其数据缩减能力以及执行高效事后多元分析的便利性方面的有效性。我们还提出了不同数据分区方案（规则和不规则）的变体，以有效地对局部多元关系进行建模。使用两个真实世界的多元数据集，我们证明了我们提出的多元抽样策略在其数据缩减能力以及执行高效事后多元分析的便利性方面的有效性。我们还提出了不同数据分区方案（规则和不规则）的变体，以有效地对局部多元关系进行建模。使用两个真实世界的多元数据集，我们证明了我们提出的多元抽样策略在其数据缩减能力以及执行高效事后多元分析的便利性方面的有效性。

更新日期：2020-09-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文