当前位置: X-MOL 学术Int. J. High Perform. Comput. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fast, good, and repeatable: Summations, vectorization, and reproducibility
The International Journal of High Performance Computing Applications ( IF 3.5 ) Pub Date : 2020-07-14 , DOI: 10.1177/1094342020938425
Brett Neuman 1 , Andy Dubois 1 , Laura Monroe 1 , Robert W Robey 2
Affiliation  

Enhanced-precision global sums are key to reproducibility in exascale applications. We examine two classic summation algorithms and show that vectorized versions are fast, good and reproducible at exascale. Both 256-bit and 512-bit implementations speed up the operation by almost a factor of four over the serial version. They thus demonstrate improved performance on global summations while retaining the numerical reproducibility of these methods.

中文翻译:

快速、良好且可重复:求和、矢量化和再现性

提高精度的全局总和是百亿亿级应用中重现性的关键。我们检查了两种经典的求和算法,并表明矢量化版本在百亿亿级上快速、良好且可重现。与串行版本相比,256 位和 512 位实现的操作速度提高了近四倍。因此,它们在保持这些方法的数值再现性的同时,证明了全局求和的性能得到了提高。
更新日期:2020-07-14
down
wechat
bug