当前位置: X-MOL 学术J. Sci. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Parallel Algorithms for Successive Convolution
Journal of Scientific Computing ( IF 2.8 ) Pub Date : 2020-12-08 , DOI: 10.1007/s10915-020-01359-x
Andrew J. Christlieb , Pierson T. Guthrey , William A. Sands , Mathialakan Thavappiragasm

The development of modern computing architectures with ever-increasing amounts of parallelism has allowed for the solution of previously intractable problems across a variety of scientific disciplines. Despite these advances, multiscale computing problems continue to pose an incredible challenge to modern architectures because they require resolving scales that often vary by orders of magnitude in both space and time. Such complications have led us to consider alternative discretizations for partial differential equations (PDEs) which use expansions involving integral operators to approximate spatial derivatives (Christlieb et al. in J Comput Phys 379:214–236, 2019; Christlieb et al. J Sci Comput 82:52(3):1–29, 2020; Christlieb et al. J Comput Phys 415:1–25, 2020). These constructions use explicit information within the integral terms, but treat boundary data implicitly, which contributes to the overall speed of the method. This approach is provably unconditionally stable for linear problems and stability has been demonstrated experimentally for nonlinear problems. Additionally, it is matrix-free in the sense that it is not necessary to invert linear systems and iteration is not required for nonlinear terms. Moreover, the scheme employs a fast summation algorithm that yields a method with a computational complexity of \({\mathcal {O}}(N)\), where N is the number of mesh points along a coordinate direction. While much work has been done to explore the theory behind these methods, their practicality in large scale computing environments is a largely unexplored topic. In this work, we explore the performance of these methods by developing a domain decomposition algorithm suitable for distributed memory systems along with shared memory algorithms. As a first pass, we derive an artificial Courant–Friedrichs–Lewy condition that enforces a nearest-neighbor (N-N) communication pattern and briefly discuss possible generalizations. We also analyze several approaches for implementing the parallel algorithms by optimizing predominant loop structures and maximizing data reuse. Using a hybrid design that employs MPI and Kokkos (Edwards and Trott in J Parallel Distrib Comput 74:3202–3216, 2014) for the distributed and shared memory components of the algorithms, respectively, we show that our methods are efficient and can sustain an update rate \(> 1\times 10^8\) DOF/node/s. We provide results that demonstrate the scalability and versatility of our algorithms using several different PDE test problems, including a nonlinear example, which employs an adaptive time-stepping rule.



中文翻译:

连续卷积的并行算法

随着并行计算数量的不断增长,现代计算体系结构的发展使得解决各种科学领域以前难以解决的问题成为可能。尽管取得了这些进步,但多尺度计算问题仍然对现代体系结构提出了难以置信的挑战,因为它们需要解析的尺度,而尺度通常在空间和时间上都变化一个数量级。这种复杂性使我们考虑使用偏微分方程(PDE)的离散化方法,该方法使用涉及积分算子的展开来近似空间导数(Christlieb等人,J Comput Phys 379:214-236,2019; Christlieb等人,J Sci Comput 82:52(3):1-29,2020; Christlieb等人,J Comput Phys 415:1-25,2020)。这些结构在积分项内使用明确的信息,但是隐式处理边界数据,这有助于方法的整体速度。对于线性问题,该方法证明是无条件稳定的;对于非线性问题,已通过实验证明了稳定性。另外,它是无矩阵的,因为它不需要反转线性系统,并且对于非线性项不需要迭代。此外,该方案采用了快速求和算法,该算法得出的计算复杂度为\({\ mathcal {O}}(N)\),其中N是沿坐标方向的网格点数。尽管已经做了很多工作来探索这些方法背后的理论,但它们在大规模计算环境中的实用性仍是一个尚未探索的主题。在这项工作中,我们通过开发适用于分布式存储系统的域分解算法以及共享存储算法来探索这些方法的性能。首先,我们得出了一个人工的Courant-Friedrichs-Lewy条件,该条件强制采用了最近邻居(NN)交流模式,并简要讨论了可能的概括。我们还分析了通过优化主要循环结构和最大化数据重用性来实现并行算法的几种方法。使用采用MPI和Kokkos的混合设计(J Parallel Distrib Comput 74:3202–3216中的Edwards和Trott,\(> 1 \乘以10 ^ 8 \) DOF /节点/秒。我们提供的结果证明了我们使用几种不同的PDE测试问题(包括使用自适应时间步长规则的非线性示例)的算法的可扩展性和多功能性。

更新日期:2020-12-08
down
wechat
bug