当前位置: X-MOL 学术Comput. Phys. Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GPU-accelerated event reconstruction for the COMET Phase-I experiment
Computer Physics Communications ( IF 7.2 ) Pub Date : 2021-01-01 , DOI: 10.1016/j.cpc.2020.107606
Beomki Yeo , MyeongJae Lee , Yoshitaka Kuno

Abstract This paper discusses a parallelized event reconstruction of the COMET Phase-I experiment. The experiment aims to discover charged lepton flavor violation by observing 104.97 MeV electrons from neutrinoless muon-to-electron conversion in muonic atoms. The event reconstruction of electrons with multiple helix turns is a challenging problem because hit-to-turn classification requires a high computation cost. The introduced algorithm finds an optimal seed of position and momentum for each turn partition by investigating the residual sum of squares based on distance-of-closest-approach (DCA) between hits and a track extrapolated from the seed. Hits with DCA less than a cutoff value are classified for the turn represented by the seed. The classification performance was optimized by tuning the cutoff value and refining the set of classified hits. The workload was parallelized over the seeds and the hits by defining two GPU kernels, which record track parameters extrapolated from the seeds and finds the DCAs of hits, respectively. A reasonable efficiency and momentum resolution was obtained for a wide momentum region which covers both signal and background electrons. The event reconstruction results from the CPU and GPU were identical to each other. The benchmarked GPUs had an order of magnitude of speedup over a CPU with 16 cores while the exact speed gains varied depending on their architectures.


用于 COMET Phase-I 实验的 GPU 加速事件重建

摘要 本文讨论了 COMET 第一阶段实验的并行事件重建。该实验旨在通过观察来自 μ 原子中无中微子的 μ 子到电子转换的 104.97 MeV 电子来发现带电轻子风味违反。具有多个螺旋转弯的电子的事件重建是一个具有挑战性的问题,因为命中到转弯分类需要很高的计算成本。引入的算法通过研究基于命中和从种子外推的轨迹之间的最近接近距离 (DCA) 的残差平方和,为每个转弯分区找到位置和动量的最佳种子。DCA 小于截止值的命中被分类为由种子表示的转弯。通过调整截止值和细化分类命中集来优化分类性能。通过定义两个 GPU 内核,工作负载在种子和命中上并行化,这些内核分别记录从种子外推的轨道参数并找到命中的 DCA。对于覆盖信号和背景电子的宽动量区域,获得了合理的效率和动量分辨率。CPU 和 GPU 的事件重建结果彼此相同。与具有 16 个内核的 CPU 相比,经过基准测试的 GPU 的速度提高了一个数量级,而确切的速度提升则取决于它们的架构。对于覆盖信号和背景电子的宽动量区域,获得了合理的效率和动量分辨率。CPU 和 GPU 的事件重建结果彼此相同。与具有 16 个内核的 CPU 相比,经过基准测试的 GPU 的速度提高了一个数量级,而确切的速度提升则取决于它们的架构。对于覆盖信号和背景电子的宽动量区域,获得了合理的效率和动量分辨率。CPU 和 GPU 的事件重建结果彼此相同。与具有 16 个内核的 CPU 相比,经过基准测试的 GPU 的速度提高了一个数量级,而确切的速度提升则取决于它们的架构。