当前位置: X-MOL 学术ACM Trans. Math. Softw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Algorithms for Efficient Reproducible Floating Point Summation
ACM Transactions on Mathematical Software ( IF 2.7 ) Pub Date : 2020-07-07 , DOI: 10.1145/3389360
Peter Ahrens 1 , James Demmel 2 , Hong Diep Nguyen 2
Affiliation  

We define “reproducibility” as getting bitwise identical results from multiple runs of the same program, perhaps with different hardware resources or other changes that should not affect the answer. Many users depend on reproducibility for debugging or correctness. However, dynamic scheduling of parallel computing resources, combined with nonassociative floating point addition, makes reproducibility challenging even for summation, or operations like the BLAS. We describe a “reproducible accumulator” data structure (the “binned number”) and associated algorithms to reproducibly sum binary floating point numbers, independent of summation order. We use a subset of the IEEE Floating Point Standard 754-2008 and bitwise operations on the standard representations in memory. Our approach requires only one read-only pass over the data, and one reduction in parallel, using a 6-word reproducible accumulator (more words can be used for higher accuracy), enabling standard tiling optimization techniques. Summing n words with a 6-word reproducible accumulator requires approximately 9 n floating point operations (arithmetic, comparison, and absolute value) and approximately 3 n bitwise operations. The final error bound with a 6-word reproducible accumulator and our default settings can be up to 2 29 times smaller than the error bound for conventional (recursive) summation on ill-conditioned double-precision inputs.

中文翻译:

高效可重现浮点求和算法

我们将“再现性”定义为从同一程序的多次运行中获得按位相同的结果,可能使用不同的硬件资源或其他不应该影响答案的更改。许多用户依赖可再现性来进行调试或正确性。然而,并行计算资源的动态调度与非关联浮点加法相结合,使得即使对于求和或像 BLAS 这样的操作也具有可重复性。我们描述了一种“可重复累加器”数据结构(“分箱数”)和相关算法,以可重复地求和二进制浮点数,而与求和顺序无关。我们使用 IEEE 浮点标准 754-2008 的子集和内存中标准表示的按位运算。我们的方法只需要对数据进行一次只读传递,和一个并行减少,使用 6 字可重复累加器(更多字可用于更高的准确性),启用标准平铺优化技术。求和n带有 6 字可重复累加器的字大约需要 9n浮点运算(算术、比较和绝对值)和大约 3n位运算。使用 6 字可重复累加器和我们的默认设置的最终错误界限最多为 229比病态双精度输入上的常规(递归)求和的误差界限小几倍。
更新日期:2020-07-07
down
wechat
bug