Efficient GPU implementation of the Particle-in-Cell/Monte-Carlo collisions method for 1D simulation of low-pressure capacitively coupled plasmas,Computer Physics Communications

当前位置： X-MOL 学术 › Comput. Phys. Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Efficient GPU implementation of the Particle-in-Cell/Monte-Carlo collisions method for 1D simulation of low-pressure capacitively coupled plasmas
Computer Physics Communications ( IF 6.3 ) Pub Date : 2021-02-26 , DOI: 10.1016/j.cpc.2021.107913
Zoltan Juhasz , Ján Ďurian , Aranka Derzsi , Štefan Matejčík , Zoltán Donkó , Peter Hartmann

In this paper, we describe an efficient, massively parallel GPU implementation strategy for speeding up one-dimensional electrostatic plasma simulations based on the Particle-in-Cell method with Monte-Carlo collisions. Relying on the Roofline performance model, we identify performance-critical points of the program and provide optimised solutions. We use four benchmark cases to verify the correctness of the CUDA and OpenCL implementations and analyse their performance properties on a number of NVIDIA and AMD cards. Plasma parameters computed with both GPU implementations differ not more than 2% from each other and respective literature reference data. Our final implementations reach over 2.6 Tflop/s sustained performance on a single card, and show speed up factors of up to 200 (when using 10 million particles). We demonstrate that GPUs can be very efficiently used for simulating collisional plasmas and argue that their further use will enable performing more accurate simulations in shorter time, increase research productivity and help in advancing the science of plasma simulation.

中文翻译：

用于低压电容耦合等离子体的一维模拟的高效粒子中/蒙特卡洛碰撞方法的GPU实现

在本文中，我们描述了一种有效的大规模并行GPU实现策略，该策略可用于基于带有Monte-Carlo碰撞的“单元中粒子”方法的一维静电等离子体仿真。依靠Roofline性能模型，我们确定程序的性能关键点并提供优化的解决方案。我们使用四个基准案例来验证CUDA和OpenCL实施的正确性，并分析它们在许多NVIDIA和AMD卡上的性能。两种GPU实施方式计算出的血浆参数彼此之间的相差不超过2％，并且各自的文献参考数据相差不超过2％。我们的最终实现在单张卡上可达到2.6 Tflops / s的持续性能，并且显示出高达200的加速因子（使用1000万个粒子时）。

更新日期：2021-03-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>