当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Work-Stealing Prefix Scan: Addressing Load Imbalance in Large-Scale Image Registration
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2021-07-07 , DOI: 10.1109/tpds.2021.3095230
Marcin Copik , Tobias Grosser , Torsten Hoefler , Paolo Bientinesi , Benjamin Berkels

Parallelism patterns (e.g., map or reduce) have proven to be effective tools for parallelizing high-performance applications. In this article, we study the recursive registration of a series of electron microscopy images - a time consuming and imbalanced computation necessary for nano-scale microscopy analysis. We show that by translating the image registration into a specific instance of the prefix scan, we can convert this seemingly sequential problem into a parallel computation that scales to over thousand of cores. We analyze a variety of scan algorithms that behave similarly for common low-compute operators and propose a novel work-stealing procedure for a hierarchical prefix scan. Our evaluation shows that by identifying a suitable and well-optimized prefix scan algorithm, we reduce time-to-solution on a series of 4,096 images spanning ten seconds of microscopy acquisition from over 10 hours to less than 3 minutes (using 1024 Intel Haswell cores), enabling derivation of material properties at nanoscale for long microscopy image series.

中文翻译:


工作窃取前缀扫描:解决大规模图像配准中的负载不平衡问题



并行模式(例如,map 或reduce)已被证明是并行化高性能应用程序的有效工具。在本文中,我们研究了一系列电子显微镜图像的递归配准——这是纳米级显微镜分析所必需的耗时且不平衡的计算。我们表明,通过将图像配准转换为前缀扫描的特定实例,我们可以将这个看似连续的问题转换为可扩展到数千个核心的并行计算。我们分析了对于常见的低计算运算符表现相似的各种扫描算法,并提出了一种用于分层前缀扫描的新颖的工作窃取程序。我们的评估表明,通过确定合适且优化良好的前缀扫描算法,我们将跨越 10 秒显微镜采集的一系列 4,096 张图像的解决时间从超过 10 个小时缩短到不到 3 分钟(使用 1024 个 Intel Haswell 内核) ),能够在纳米尺度上导出长显微镜图像系列的材料特性。
更新日期:2021-07-07
down
wechat
bug