当前位置: X-MOL 学术Astron. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
High Performance Computing for gravitational lens modeling: Single vs double precision on GPUs and CPUs
Astronomy and Computing ( IF 2.5 ) Pub Date : 2019-10-31 , DOI: 10.1016/j.ascom.2019.100340
M. Rexroth , C. Schäfer , G. Fourestey , J.-P. Kneib

Strong gravitational lensing is a powerful probe of cosmology and the dark matter distribution. Efficient lensing software is already a necessity to fully use its potential and the performance demands will only increase with the upcoming generation of telescopes. In this paper, we present a proof-of-concept study on the impact of High Performance Computing techniques on a performance-critical part of the widely used lens modeling software LENSTOOL. We implement the algorithm once as a highly optimized CPU version and once with graphics card acceleration for a simple parametric lens model. In addition, we study the impact of finite machine precision on the lensing algorithm. While double precision is the default choice for scientific applications, we find that single precision can be sufficiently accurate for our purposes and lead to a big speedup. Therefore we develop and present a mixed precision algorithm which only uses double precision when necessary. We measure the performance of the different implementations and find that the use of High Performance Computing Techniques dramatically improves the code performance both on CPUs and GPUs. Compared to the current LENSTOOL implementation on 12 CPU cores, we obtain speedup factors of up to 170. We achieve this optimal performance by using our mixed precision algorithm on a high-end GPU which is common in modern supercomputers. We also show that these techniques reduce the energy consumption by up to 98%. Furthermore, we demonstrate that a highly competitive speedup can be reached with consumer GPUs. While they are an order of magnitude cheaper than the high-end graphics cards, they are rarely used for scientific computations due to their low double precision performance. However, our mixed precision algorithm unlocks their full potential. Consequently, the consumer GPU delivers a speedup which is only a factor of four lower than the best speedup achieved by a high-end GPU.



中文翻译:

用于重力透镜建模的高性能计算:GPU和CPU的单精度和双精度

强引力透镜是探测宇宙学和暗物质分布的有力工具。高效的镜头软件已经是充分利用其潜力的必要条件,而对性能的要求只会随着下一代望远镜的使用而增加。在本文中,我们对高性能计算技术对广泛使用的镜头建模软件LENSTOOL的性能至关重要的部分的影响进行了概念验证研究。我们将算法一次实现为高度优化的CPU版本,一次通过图形卡加速实现简单的参数化镜头模型。此外,我们研究了有限机器精度对镜头算法的影响。虽然双精度是科学应用的默认选择,但我们发现单精度可以满足我们的目的,并且可以大大提高速度。因此,我们开发并提出了一种混合精度算法,该算法仅在必要时使用双精度。我们评估了不同实现的性能,发现使用高性能计算技术可以显着提高CPU和GPU上的代码性能。与目前的LENSTOOL相比通过在12个CPU内核上实现,我们可以获得高达170的加速因子。通过在现代超级计算机中常见的高端GPU上使用混合精度算法,可以达到这种最佳性能。我们还表明,这些技术可将能耗降低多达98%。此外,我们证明了消费类GPU可以实现极具竞争力的加速。尽管它们比高端显卡便宜一个数量级,但由于其双精度性能低,因此很少用于科学计算。但是,我们的混合精度算法释放了它们的全部潜力。因此,消费类GPU的加速比高端GPU所实现的最佳加速仅低四倍。

更新日期:2019-10-31
down
wechat
bug