当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Compression Ratio Modeling and Estimation Across Error Bounds for Lossy Compression
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2020-07-01 , DOI: 10.1109/tpds.2019.2938503
Jinzhen Wang , Tong Liu , Qing Liu , Xubin He , Huizhang Luo , Weiming He

Scientific simulations on high-performance computing (HPC) systems generate vast amounts of floating-point data that need to be reduced in order to lower the storage and I/O cost. Lossy compressors trade data accuracy for reduction performance and have been demonstrated to be effective in reducing data volume. However, a key hurdle to wide adoption of lossy compressors is that the trade-off between data accuracy and compression performance, particularly the compression ratio, is not well understood. Consequently, domain scientists often need to exhaust many possible error bounds before they can figure out an appropriate setup. The current practice of using lossy compressors to reduce data volume is, therefore, through trial and error, which is not efficient for large datasets which take a tremendous amount of computational resources to compress. This paper aims to analyze and estimate the compression performance of lossy compressors on HPC datasets. In particular, we predict the compression ratios of two modern lossy compressors that achieve superior performance, SZ and ZFP, on HPC scientific datasets at various error bounds, based upon the compressors’ intrinsic metrics collected under a given base error bound. We evaluate the estimation scheme using twenty real HPC datasets and the results confirm the effectiveness of our approach.

中文翻译:

有损压缩的跨误差边界的压缩比建模和估计

高性能计算 (HPC) 系统上的科学模拟会生成大量浮点数据,需要减少这些数据以降低存储和 I/O 成本。有损压缩机以数据准确性换取降低性能,并且已被证明可有效减少数据量。然而,广泛采用有损压缩器的一个关键障碍是数据准确性和压缩性能(尤其是压缩比)之间的权衡尚未得到很好的理解。因此,领域科学家在找出合适的设置之前,通常需要穷尽许多可能的错误界限。因此,目前使用有损压缩器来减少数据量的做法是通过反复试验,这对于需要大量计算资源来压缩的大型数据集来说效率不高。本文旨在分析和估计有损压缩器在 HPC 数据集上的压缩性能。特别是,我们根据在给定基本误差范围下收集的压缩器的内在指标,预测了两种现代有损压缩器的压缩比,它们在不同误差范围的 HPC 科学数据集上实现了卓越的性能,SZ 和 ZFP。我们使用 20 个真实的 HPC 数据集评估估计方案,结果证实了我们方法的有效性。基于在给定基本误差界限下收集的压缩器的内在指标。我们使用 20 个真实的 HPC 数据集评估估计方案,结果证实了我们方法的有效性。基于在给定基本误差界限下收集的压缩器的内在指标。我们使用 20 个真实的 HPC 数据集评估估计方案,结果证实了我们方法的有效性。
更新日期:2020-07-01
down
wechat
bug