当前位置: X-MOL 学术Astron. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Adaptive tiling for parallel N-body simulations on many core
Astronomy and Computing ( IF 1.9 ) Pub Date : 2021-04-20 , DOI: 10.1016/j.ascom.2021.100466
M.A. Khan , M.A. Al-Mouhamed , N. Mohammad

The N-body simulations consist of computing mutual gravitational forces exerted on each body in O(N). The Barnes–Hut approximation allows processing a group of bodies in O(1) if they are far enough from a given body, which drops the complexity of the whole simulation to O(NLogN). The octree is used to ease the pruning process but at the cost of some irregularity in the access pattern. In a parallel N-body implementation the bodies are partitioned among threads that are executed on multiple cores. The depth-first traversal of the octree is used for processing each body, which causes repeated cache misses during traversal. This paper proposes different types of tiling methods to improve the performance of N-body simulations. It presents an experimental analysis of octree traversal by using these tiling methods to identify the potential of cache data reuse. It then evaluates these tiling methods for varying tile sizes with different galaxy sizes and a varying number of threads on several machine architectures. The efficiency of tiling approaches depends on the chosen tile size. It is shown that a speedup of 8 times can be achieved by choosing the appropriate tile size on a 60-core Intel accelerator. In order to determine appropriate tile size, the paper proposes an adaptive tiling approach to implicitly adapt the tile size to the distribution of threads, the cache capacity, cache latency, problem size and dynamic changes in the access pattern over the iterations. The proposed adaptive tiling approach can be used as an optimization option in parallel compilers.



中文翻译:

在多个核心上进行并行N体模拟的自适应平铺

N体模拟包括计算施加在每个物体上的相互引力 Øñ。通过Barnes–Hut逼近,可以处理一组物体Ø1个 如果它们离给定的主体足够远,则将整个模拟的复杂性降低到 Øñ大号ØGñ。八叉树用于简化修剪过程,但以访问模式中的某些不规则性为代价。在并行N主体实现中,主体在多个内核上执行的线程之间分配。八叉树的深度优先遍历用于处理每个主体,这会在遍历期间导致重复的高速缓存未命中。本文提出了不同类型的平铺方法,以提高N体模拟的性能。通过使用这些切片方法来确定高速缓存数据重用的潜力,它对八叉树遍历进行了实验分析。然后,针对几种具有不同机器结构的星系大小和线程数量的变化来评估这些平铺方法,以了解不同的平铺大小。平铺方法的效率取决于所选的平铺大小。结果表明,通过在60核Intel加速器上选择适当的磁贴大小,可以实现8倍的加速。为了确定适当的切片大小,本文提出了一种自适应切片方法,以使切片大小隐式地适应线程的分布,缓存容量,缓存等待时间,问题大小以及迭代过程中访问模式的动态变化。所提出的自适应分块方法可以用作并行编译器中的优化选项。

更新日期:2021-04-30
down
wechat
bug