当前位置: X-MOL 学术IEEE Signal Process. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Fast Parallel Particle Filter for Shared Memory Systems
IEEE Signal Processing Letters ( IF 3.9 ) Pub Date : 2020-01-01 , DOI: 10.1109/lsp.2020.3014035
Alessandro Varsi , Jack Taylor , Lykourgos Kekempanos , Edward Pyzer Knapp , Simon Maskell

Particle Filters (PFs) are Sequential Monte Carlo methods which are widely used to solve filtering problems of dynamic models under Non-Linear Non-Gaussian noise. Modern PF applications have demanding accuracy and run-time constraints that can be addressed through parallel computing. However, an efficient parallelization of PFs can only be achieved by effectively parallelizing the bottleneck: resampling and its constituent redistribution step. A pre-existing implementation of redistribute on Shared Memory Architectures (SMAs) achieves $O(\frac{N}{T}log_2N)$ time complexity over $T$ parallel cores. This redistribute implementation is, however, highly computationally intensive and cannot be effectively parallelized due to the inherently limited number of cores of SMAs. In this paper, we propose a novel parallel redistribute on OpenMP 4.5 which takes $O(\frac{N}{T} + log_2N)$ steps and fully exploits the computational power of SMAs. The proposed approach is up to six times faster than the $O(\frac{N}{T}log_2N)$ one and its implementation on GPU provides a further three-time speed-up vs its equivalent on a 32-core CPU. We also show on an exemplary PF that our redistribution is no longer the bottleneck.

中文翻译:

用于共享内存系统的快速并行粒子滤波器

粒子滤波器(PFs)是一种序贯蒙特卡罗方法,广泛用于解决非线性非高斯噪声下动态模型的滤波问题。现代 PF 应用程序具有苛刻的准确性和运行时间限制,可以通过并行计算来解决。然而,PF 的有效并行化只能通过有效地并行化瓶颈来实现:重采样及其组成的重新分配步骤。在共享内存架构 (SMA) 上重新分配的预先存在的实现实现了$O(\frac{N}{T}log_2N)$ 时间复杂度超过 $T$并行核心。然而,由于 SMA 的内核数量固有的限制,这种重新分配实现是高度计算密集型的,并且不能有效地并行化。在本文中,我们在 OpenMP 4.5 上提出了一种新颖的并行重新分发,它需要$O(\frac{N}{T} + log_2N)$步骤并充分利用 SMA 的计算能力。所提出的方法比所提出的方法快六倍$O(\frac{N}{T}log_2N)$一个和它在 GPU 上的实现比它在 32 核 CPU 上的等价物提供了进一步的三倍加速。我们还在示例性 PF 上表明我们的重新分配不再是瓶颈。
更新日期:2020-01-01
down
wechat
bug