Kernel aggregated fast multipole method

Yan, Wen; Blackwell, Robert

doi:10.1007/s10444-021-09896-1

Kernel aggregated fast multipole method

Efficient summation of Laplace and Stokes kernel functions

Published: 06 September 2021

Volume 47, article number 69, (2021)
Cite this article

Advances in Computational Mathematics Aims and scope Submit manuscript

204 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Many different simulation methods for Stokes flow problems involve a common computationally intense task—the summation of a kernel function over O(N²) pairs of points. One popular technique is the kernel independent fast multipole method (KIFMM), which constructs a spatial adaptive octree for all points and places a small number of equivalent multipole and local equivalent points around each octree box, and completes the kernel sum with O(N) cost, using these equivalent points. Simpler kernels can be used between these equivalent points to improve the efficiency of KIFMM. Here we present further extensions and applications to this idea, to enable efficient summations and flexible boundary conditions for various kernels. We call our method the kernel aggregated fast multipole method (KAFMM), because it uses different kernel functions at different stages of octree traversal. We have implemented our method as an open-source software library STKFMM based on the high-performance library PVFMM, with support for Laplace kernels, the Stokeslet, regularized Stokeslet, Rotne-Prager-Yamakawa (RPY) tensor, and the Stokes double-layer and traction operators. Open and periodic boundary conditions are supported for all kernels, and the no-slip wall boundary condition is supported for the Stokeslet and RPY tensor. The package is designed to be ready-to-use as well as being readily extensible to additional kernels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FEMPAR: An Object-Oriented Parallel Finite Element Framework

Article Open access 11 October 2017

Santiago Badia, Alberto F. Martín & Javier Principe

A kernel-independent uniform fast multipole method based on barycentric rational interpolation

Article 23 December 2022

Jiangli Liang & Shuhuang Xiang

Anisotropic Kernels for Particle Flow Simulation

References

Cortez, R.: Regularized Stokeslet segments. J. Comput. Phys. 375, 783–796 (2018). https://doi.org/10.1016/j.jcp.2018.08.055
Article MathSciNet Google Scholar
Cortez, R., Fauci, L., Medovikov, A.: The method of regularized Stokeslets in three dimensions: Analysis, validation, and application to helical swimming. Phys. Fluids (1994-present) 17(3) 031, 504 (2005). https://doi.org/10.1063/1.1830486
MATH Google Scholar
Greengard, L., Rokhlin, V.: A fast algorithm for particle simulations. J. Comput. Phys. 73(2), 325–348 (1987). https://doi.org/10.1016/0021-9991(87)90140-9
Article MathSciNet Google Scholar
Guan, W., Cheng, X., Huang, J., Huber, G., Li, W., McCammon, J. A., Zhang, B.: RPYFMM: Parallel adaptive fast multipole method for Rotne–Prager–Yamakawa tensor in biomolecular hydrodynamics simulations. Comput. Phys. Commun. 227, 99–108 (2018). https://doi.org/10.1016/j.cpc.2018.02.005
Article MathSciNet Google Scholar
Klinteberg, L.A., Shamshirgar, D.S, Tornberg, A.K: Fast Ewald summation for free-space Stokes potentials. Res. Math. Sci. 4(1), 1 (2017). https://doi.org/10.1186/s40687-016-0092-7
Article MathSciNet Google Scholar
LaGrone, J., Cortez, R., Yan, W., Fauci, L.: Complex dynamics of long, flexible fibers in shear. J. Non-Newtonian Fluid Mech. 269, 73–81 (2019). https://doi.org/10.1016/j.jnnfm.2019.06.007
Article MathSciNet Google Scholar
Liang, Z., Gimbutas, Z., Greengard, L., Huang, J., Jiang, S.: A fast multipole method for the Rotne–Prager–Yamakawa tensor and its applications. J. Comput. Phys. 234(Supplement C), 133–139 (2013). https://doi.org/10.1016/j.jcp.2012.09.021
Article MathSciNet Google Scholar
Lindbo, D., Tornberg, A. K.: Fast and spectrally accurate Ewald summation for 2-periodic electrostatic systems. J. Chem. Phys. 136(16), 164,111 (2012). https://doi.org/10.1063/1.4704177
Article Google Scholar
Malhotra, D., Biros, G.: PVFMM: A Parallel Kernel Independent FMM for Particle and Volume Potentials. Commun. Comput. Phys. 18(03), 808–830 (2015). https://doi.org/10.4208/cicp.020215.150515sw
Article MathSciNet Google Scholar
Mizerski, K. A., Wajnryb, E., Zuk, P. J., Szymczak, P.: The rotne-prager-yamakawa approximation for periodic systems in a shear flow. J. Chem. Phys. 140(18), 184,103 (2014). https://doi.org/10.1063/1.4871113
Article Google Scholar
Olson, S. D., Lim, S., Cortez, R.: Modeling the dynamics of an elastic rod with intrinsic curvature and twist using a regularized Stokes formulation. J. Comput. Phys. 238, 169–187 (2013). https://doi.org/10.1016/j.jcp.2012.12.026
Article MathSciNet Google Scholar
Rostami, M. W., Olson, S. D.: Kernel-independent fast multipole method within the framework of regularized Stokeslets. J. Fluids Struct. 67, 60–84 (2016). https://doi.org/10.1016/j.jfluidstructs.2016.07.006
Article Google Scholar
Rotne, J., Prager, S.: Variational treatment of hydrodynamic interaction in polymers. J. Chem. Phys. 50(11), 4831–4837 (1969). https://doi.org/10.1063/1.1670977
Article Google Scholar
Shamshirgar, D. S., Tornberg, A. K.: The spectral Ewald method for singly periodic domains. J. Comput. Phys. 347(Supplement C), 341–366 (2017). https://doi.org/10.1016/j.jcp.2017.07.001
Article MathSciNet Google Scholar
Srinivasan, S., Tornberg, A. K.: Fast Ewald summation for Green’s functions of Stokes flow in a half-space. Res. Math. Sci. 5(3), 35 (2018). https://doi.org/10.1007/s40687-018-0153-1
Article MathSciNet Google Scholar
Swan, J. W., Brady, J. F.: Anisotropic diffusion in confined colloidal dispersions: the evanescent diffusivity. J. Chem. Phys. 135(1), 14,701 (2011). https://doi.org/10.1063/1.3604530
Article Google Scholar
Tornberg, A. K.: The Ewald sums for singly, doubly and triply periodic electrostatic systems. Adv. Comput. Math. 42(1), 227–248 (2015). https://doi.org/10.1007/s10444-015-9422-3
Article MathSciNet Google Scholar
Wajnryb, E., Mizerski, K. A., Zuk, P. J., Szymczak, P.: Generalization of the Rotne–Prager–Yamakawa mobility and shear disturbance tensors. J. Fluid Mech., 731. https://doi.org/10.1017/jfm.2013.402 (2013)
Wang, L.: A Kernel-Independent Treecode for General Rotne-Prager-Yamakawa Tensor. Adv. Appl. Math. Mech. 13(2), 296–310 (2021). https://doi.org/10.4208/aamm.OA-2019-0322
Article MathSciNet Google Scholar
Wang, M., Brady, J. F.: Spectral Ewald Acceleration of Stokesian Dynamics for polydisperse suspensions. J. Comput. Phys. 306, 443–477 (2016). https://doi.org/10.1016/j.jcp.2015.11.042
Article MathSciNet Google Scholar
Yamakawa, H.: Transport properties of polymer chains in dilute solution: Hydrodynamic interaction. J. Chem. Phys. 53(1), 436–443 (1970). https://doi.org/10.1063/1.1673799
Article Google Scholar
Yan, W., Brady, J. F.: The behavior of active diffusiophoretic suspensions: An accelerated Laplacian dynamics study. J. Chem. Phys. 145(13), 134,902 (2016). https://doi.org/10.1063/1.4963722
Article Google Scholar
Yan, W., Corona, E., Malhotra, D., Veerapaneni, S., Shelley, M.: A scalable computational platform for particulate Stokes suspensions. J. Comput. Phys. 416(109), 524 (2020). https://doi.org/10.1016/j.jcp.2020.109524
MathSciNet MATH Google Scholar
Yan, W., Shelley, M.: Flexibly imposing periodicity in kernel independent fmm: a multipole-to-local operator approach. J. Comput. Phys. 355, 214–232 (2018). https://doi.org/10.1016/j.jcp.2017.11.012
Article MathSciNet Google Scholar
Yan, W., Shelley, M.: Universal image systems for non-periodic and periodic Stokes flows above a no-slip wall. J. Comput. Phys. 375, 263–270 (2018). https://doi.org/10.1016/j.jcp.2018.08.041
Article MathSciNet Google Scholar
Ying, L., Biros, G., Zorin, D.: A kernel-independent adaptive fast multipole algorithm in two and three dimensions. J. Comput. Phys. 196 (2), 591–626 (2004). https://doi.org/10.1016/j.jcp.2003.11.021
Article MathSciNet Google Scholar
Zuk, P.J., Wajnryb, E., Mizerski, K.A., Szymczak, P.: Rotne-prager-yamakawa approximation for different-sized particles in application to macromolecular bead models. J. Fluid Mech. 741. https://doi.org/10.1017/jfm.2013.668. http://journals.cambridge.org/article_S002211201300668X (2014)

Download references

Acknowledgements

We thank Dhairya Malhotra and Alex Barnett for inspiring discussions, and thank Bryce Palmer for comments on the manuscript.

Author information

Authors and Affiliations

Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, 10010, USA
Wen Yan & Robert Blackwell

Authors

Wen Yan
View author publications
You can also search for this author in PubMed Google Scholar
Robert Blackwell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wen Yan.

Additional information

Communicated by: Michael O’Neil

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Advances in Computational Integral Equations

Guest Editors: Stephanie Chaillat, Adrianna Gillman, Per-Gunnar Martinsson, Michael O’Neil, Mary-Catherine Kropinski, Timo Betcke, Alex Barnett

Appendices

Appendix A: All kernels supported by the open-source STKFMM package

Table 12 All kernels currently supported by the STKFMM package. All are supported for open boundary conditions and for singly, doubly, and triply periodic boundary conditions.

Full size table

Appendix B. Overhead of periodic boundary condition

In contrast to our previous implementation [24], we have merged our postfix of applying the M → L operator to implement periodic boundary condition into the downward pass (L → L and L → T ) of running KIFMM itself, with the help of Dr. Malhotra. Therefore, the extra overhead of running KIFMM with periodicity compared to open boundary condition is significantly reduced. Figure 9 shows the comparison of tree construction time and FMM evaluation time for four different kernels under different boundary conditions. The overhead of implementing periodic boundary conditions is usually 5% of the evaluation time with free boundary condition.

Appendix C. Note about our SIMD implementation

Although the benchmarks presented in Section 3 were on CPUs (Intel Skylake and Cascade Lake) that support AVX-512 instructions, our kernels are hand-optimized with AVX2 instructions. Since the AVX-512 instruction set is still evolving and new subsets of instructions are still being added, we plan to support AVX-512 in the future when the instruction set is more stable.

Appendix D. Parallel scaling

Detailed parallel scaling benchmarks and analysis are beyond the scope of this paper because we implemented our package based on the PVFMM library without modifying any of its advanced parallelization facilities. Since the parallel scaling of PVFMM has been thoroughly tested [9], we present only some brief test results here. We tested strong and weak scaling for the low-precision case m = 8 and the high-precision case m = 12. The scaling benchmarks were measured on a cluster where each node was equipped with two Intel Cascade Lake 8268 CPUs @ 2.9 GHz. Each CPU had 24 cores, hyper-threading was disabled, and the machines were set to “performance” mode. We launched two MPI ranks on each CPU. Each MPI rank launched 12 OpenMP threads and each thread was bound to one CPU core. The number of source and target points was increased compared to the convergence tests in the previous sections, and we increased the box size to L = 100. We again used a non-uniform distribution of points picked from the log-normal distribution in Eq. (36).

We tested strong scaling using the RPY kernel with a doubly periodic boundary condition and 2.7 × 10⁷ target points and source points. Figure 10a shows the results, which are comparable to 70% parallel efficiency (dashed black line). We tested weak scaling using the StokesPVel kernel G^P with 4 million target points per node, i.e., 1 million points for each MPI rank. As shown in Fig. 10b, the t_run results are again comparable to 70% parallel efficiency. These results are completely determined by the PVFMM library, and are close to its benchmarks [9].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, W., Blackwell, R. Kernel aggregated fast multipole method. Adv Comput Math 47, 69 (2021). https://doi.org/10.1007/s10444-021-09896-1

Download citation

Received: 16 February 2021
Accepted: 16 August 2021
Published: 06 September 2021
DOI: https://doi.org/10.1007/s10444-021-09896-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kernel aggregated fast multipole method

Abstract

Access this article

Similar content being viewed by others

FEMPAR: An Object-Oriented Parallel Finite Element Framework

A kernel-independent uniform fast multipole method based on barycentric rational interpolation

Anisotropic Kernels for Particle Flow Simulation

References

Acknowledgements