Performance Optimization for Relative-Error-Bounded Lossy Compression on Scientific Data,IEEE Transactions on Parallel and Distributed Systems

当前位置： X-MOL 学术 › IEEE Trans. Parallel Distrib. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Performance Optimization for Relative-Error-Bounded Lossy Compression on Scientific Data
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2020-07-01 , DOI: 10.1109/tpds.2020.2972548
Xiangyu Zou , Tao Lu , Wen Xia , Xuan Wang , Weizhe Zhang , Haijun Zhang , Sheng Di , Dingwen Tao , Franck Cappello

Scientific simulations in high-performance computing (HPC) environments generate vast volume of data, which may cause a severe I/O bottleneck at runtime and a huge burden on storage space for postanalysis. Unlike traditional data reduction schemes such as deduplication or lossless compression, not only can error-controlled lossy compression significantly reduce the data size but it also holds the promise to satisfy user demand on error control. Pointwise relative error bounds (i.e., compression errors depends on the data values) are widely used by many scientific applications with lossy compression since error control can adapt to the error bound in the dataset automatically. Pointwise relative-error-bounded compression is complicated and time consuming. In this article, we develop efficient precomputation-based mechanisms based on the SZ lossy compression framework. Our mechanisms can avoid costly logarithmic transformation and identify quantization factor values via a fast table lookup, greatly accelerating the relative-error-bounded compression with excellent compression ratios. In addition, we reduce traversing operations for Huffman decoding, significantly accelerating the decompression process in SZ. Experiments with eight well-known real-world scientific simulation datasets show that our solution can improve the compression and decompression rates (i.e., the speed) by about 40 and 80 p, respectively, in most of cases, making our designed lossy compression strategy the best-in-class solution in most cases.

中文翻译：

科学数据上相对误差有界有损压缩的性能优化

高性能计算 (HPC) 环境中的科学模拟会产生大量数据，这可能会导致运行时严重的 I/O 瓶颈和后分析存储空间的巨大负担。与重复数据删除或无损压缩等传统数据缩减方案不同，错误控制的有损压缩不仅可以显着减少数据大小，而且有望满足用户对错误控制的需求。逐点相对误差界限（即，压缩误差取决于数据值）被许多具有有损压缩的科学应用广泛使用，因为误差控制可以自动适应数据集中的误差界限。逐点相对误差有界压缩复杂且耗时。在本文中，我们基于 SZ 有损压缩框架开发了基于预计算的高效机制。我们的机制可以避免代价高昂的对数变换，并通过快速查找表来识别量化因子值，以出色的压缩比极大地加速相对误差有界压缩。此外，我们减少了霍夫曼解码的遍历操作，显着加快了 SZ 中的解压缩过程。对八个著名的现实世界科学模拟数据集进行的实验表明，在大多数情况下，我们的解决方案可以分别将压缩和解压缩率（即速度）提高约 40 p 和 80 p，使我们设计的有损压缩策略成为大多数情况下是一流的解决方案。我们的机制可以避免代价高昂的对数变换，并通过快速查找表来识别量化因子值，以出色的压缩比极大地加速相对误差有界压缩。此外，我们减少了霍夫曼解码的遍历操作，显着加快了 SZ 中的解压缩过程。对八个著名的现实世界科学模拟数据集进行的实验表明，在大多数情况下，我们的解决方案可以分别将压缩和解压缩率（即速度）提高约 40 p 和 80 p，使我们设计的有损压缩策略成为大多数情况下是一流的解决方案。我们的机制可以避免代价高昂的对数变换，并通过快速查找表来识别量化因子值，以出色的压缩比大大加速相对误差有界压缩。此外，我们减少了霍夫曼解码的遍历操作，显着加快了 SZ 中的解压缩过程。对八个著名的现实世界科学模拟数据集进行的实验表明，在大多数情况下，我们的解决方案可以分别将压缩和解压缩率（即速度）提高约 40 p 和 80 p，使我们设计的有损压缩策略成为大多数情况下是一流的解决方案。显着加速 SZ 的减压过程。对八个著名的现实世界科学模拟数据集进行的实验表明，在大多数情况下，我们的解决方案可以分别将压缩和解压缩率（即速度）提高约 40 p 和 80 p，使我们设计的有损压缩策略成为大多数情况下是一流的解决方案。显着加速 SZ 的减压过程。对八个著名的现实世界科学模拟数据集进行的实验表明，在大多数情况下，我们的解决方案可以分别将压缩和解压缩率（即速度）提高约 40 p 和 80 p，使我们设计的有损压缩策略成为大多数情况下是一流的解决方案。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11