Skip to main content
Log in

Kernel aggregated fast multipole method

Efficient summation of Laplace and Stokes kernel functions

  • Published:
Advances in Computational Mathematics Aims and scope Submit manuscript

Abstract

Many different simulation methods for Stokes flow problems involve a common computationally intense task—the summation of a kernel function over O(N2) pairs of points. One popular technique is the kernel independent fast multipole method (KIFMM), which constructs a spatial adaptive octree for all points and places a small number of equivalent multipole and local equivalent points around each octree box, and completes the kernel sum with O(N) cost, using these equivalent points. Simpler kernels can be used between these equivalent points to improve the efficiency of KIFMM. Here we present further extensions and applications to this idea, to enable efficient summations and flexible boundary conditions for various kernels. We call our method the kernel aggregated fast multipole method (KAFMM), because it uses different kernel functions at different stages of octree traversal. We have implemented our method as an open-source software library STKFMM based on the high-performance library PVFMM, with support for Laplace kernels, the Stokeslet, regularized Stokeslet, Rotne-Prager-Yamakawa (RPY) tensor, and the Stokes double-layer and traction operators. Open and periodic boundary conditions are supported for all kernels, and the no-slip wall boundary condition is supported for the Stokeslet and RPY tensor. The package is designed to be ready-to-use as well as being readily extensible to additional kernels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Cortez, R.: Regularized Stokeslet segments. J. Comput. Phys. 375, 783–796 (2018). https://doi.org/10.1016/j.jcp.2018.08.055

    Article  MathSciNet  Google Scholar 

  2. Cortez, R., Fauci, L., Medovikov, A.: The method of regularized Stokeslets in three dimensions: Analysis, validation, and application to helical swimming. Phys. Fluids (1994-present) 17(3) 031, 504 (2005). https://doi.org/10.1063/1.1830486

    MATH  Google Scholar 

  3. Greengard, L., Rokhlin, V.: A fast algorithm for particle simulations. J. Comput. Phys. 73(2), 325–348 (1987). https://doi.org/10.1016/0021-9991(87)90140-9

    Article  MathSciNet  Google Scholar 

  4. Guan, W., Cheng, X., Huang, J., Huber, G., Li, W., McCammon, J. A., Zhang, B.: RPYFMM: Parallel adaptive fast multipole method for Rotne–Prager–Yamakawa tensor in biomolecular hydrodynamics simulations. Comput. Phys. Commun. 227, 99–108 (2018). https://doi.org/10.1016/j.cpc.2018.02.005

    Article  MathSciNet  Google Scholar 

  5. Klinteberg, L.A., Shamshirgar, D.S, Tornberg, A.K: Fast Ewald summation for free-space Stokes potentials. Res. Math. Sci. 4(1), 1 (2017). https://doi.org/10.1186/s40687-016-0092-7

    Article  MathSciNet  Google Scholar 

  6. LaGrone, J., Cortez, R., Yan, W., Fauci, L.: Complex dynamics of long, flexible fibers in shear. J. Non-Newtonian Fluid Mech. 269, 73–81 (2019). https://doi.org/10.1016/j.jnnfm.2019.06.007

    Article  MathSciNet  Google Scholar 

  7. Liang, Z., Gimbutas, Z., Greengard, L., Huang, J., Jiang, S.: A fast multipole method for the Rotne–Prager–Yamakawa tensor and its applications. J. Comput. Phys. 234(Supplement C), 133–139 (2013). https://doi.org/10.1016/j.jcp.2012.09.021

    Article  MathSciNet  Google Scholar 

  8. Lindbo, D., Tornberg, A. K.: Fast and spectrally accurate Ewald summation for 2-periodic electrostatic systems. J. Chem. Phys. 136(16), 164,111 (2012). https://doi.org/10.1063/1.4704177

    Article  Google Scholar 

  9. Malhotra, D., Biros, G.: PVFMM: A Parallel Kernel Independent FMM for Particle and Volume Potentials. Commun. Comput. Phys. 18(03), 808–830 (2015). https://doi.org/10.4208/cicp.020215.150515sw

    Article  MathSciNet  Google Scholar 

  10. Mizerski, K. A., Wajnryb, E., Zuk, P. J., Szymczak, P.: The rotne-prager-yamakawa approximation for periodic systems in a shear flow. J. Chem. Phys. 140(18), 184,103 (2014). https://doi.org/10.1063/1.4871113

    Article  Google Scholar 

  11. Olson, S. D., Lim, S., Cortez, R.: Modeling the dynamics of an elastic rod with intrinsic curvature and twist using a regularized Stokes formulation. J. Comput. Phys. 238, 169–187 (2013). https://doi.org/10.1016/j.jcp.2012.12.026

    Article  MathSciNet  Google Scholar 

  12. Rostami, M. W., Olson, S. D.: Kernel-independent fast multipole method within the framework of regularized Stokeslets. J. Fluids Struct. 67, 60–84 (2016). https://doi.org/10.1016/j.jfluidstructs.2016.07.006

    Article  Google Scholar 

  13. Rotne, J., Prager, S.: Variational treatment of hydrodynamic interaction in polymers. J. Chem. Phys. 50(11), 4831–4837 (1969). https://doi.org/10.1063/1.1670977

    Article  Google Scholar 

  14. Shamshirgar, D. S., Tornberg, A. K.: The spectral Ewald method for singly periodic domains. J. Comput. Phys. 347(Supplement C), 341–366 (2017). https://doi.org/10.1016/j.jcp.2017.07.001

    Article  MathSciNet  Google Scholar 

  15. Srinivasan, S., Tornberg, A. K.: Fast Ewald summation for Green’s functions of Stokes flow in a half-space. Res. Math. Sci. 5(3), 35 (2018). https://doi.org/10.1007/s40687-018-0153-1

    Article  MathSciNet  Google Scholar 

  16. Swan, J. W., Brady, J. F.: Anisotropic diffusion in confined colloidal dispersions: the evanescent diffusivity. J. Chem. Phys. 135(1), 14,701 (2011). https://doi.org/10.1063/1.3604530

    Article  Google Scholar 

  17. Tornberg, A. K.: The Ewald sums for singly, doubly and triply periodic electrostatic systems. Adv. Comput. Math. 42(1), 227–248 (2015). https://doi.org/10.1007/s10444-015-9422-3

    Article  MathSciNet  Google Scholar 

  18. Wajnryb, E., Mizerski, K. A., Zuk, P. J., Szymczak, P.: Generalization of the Rotne–Prager–Yamakawa mobility and shear disturbance tensors. J. Fluid Mech., 731. https://doi.org/10.1017/jfm.2013.402 (2013)

  19. Wang, L.: A Kernel-Independent Treecode for General Rotne-Prager-Yamakawa Tensor. Adv. Appl. Math. Mech. 13(2), 296–310 (2021). https://doi.org/10.4208/aamm.OA-2019-0322

    Article  MathSciNet  Google Scholar 

  20. Wang, M., Brady, J. F.: Spectral Ewald Acceleration of Stokesian Dynamics for polydisperse suspensions. J. Comput. Phys. 306, 443–477 (2016). https://doi.org/10.1016/j.jcp.2015.11.042

    Article  MathSciNet  Google Scholar 

  21. Yamakawa, H.: Transport properties of polymer chains in dilute solution: Hydrodynamic interaction. J. Chem. Phys. 53(1), 436–443 (1970). https://doi.org/10.1063/1.1673799

    Article  Google Scholar 

  22. Yan, W., Brady, J. F.: The behavior of active diffusiophoretic suspensions: An accelerated Laplacian dynamics study. J. Chem. Phys. 145(13), 134,902 (2016). https://doi.org/10.1063/1.4963722

    Article  Google Scholar 

  23. Yan, W., Corona, E., Malhotra, D., Veerapaneni, S., Shelley, M.: A scalable computational platform for particulate Stokes suspensions. J. Comput. Phys. 416(109), 524 (2020). https://doi.org/10.1016/j.jcp.2020.109524

    MathSciNet  MATH  Google Scholar 

  24. Yan, W., Shelley, M.: Flexibly imposing periodicity in kernel independent fmm: a multipole-to-local operator approach. J. Comput. Phys. 355, 214–232 (2018). https://doi.org/10.1016/j.jcp.2017.11.012

    Article  MathSciNet  Google Scholar 

  25. Yan, W., Shelley, M.: Universal image systems for non-periodic and periodic Stokes flows above a no-slip wall. J. Comput. Phys. 375, 263–270 (2018). https://doi.org/10.1016/j.jcp.2018.08.041

    Article  MathSciNet  Google Scholar 

  26. Ying, L., Biros, G., Zorin, D.: A kernel-independent adaptive fast multipole algorithm in two and three dimensions. J. Comput. Phys. 196 (2), 591–626 (2004). https://doi.org/10.1016/j.jcp.2003.11.021

    Article  MathSciNet  Google Scholar 

  27. Zuk, P.J., Wajnryb, E., Mizerski, K.A., Szymczak, P.: Rotne-prager-yamakawa approximation for different-sized particles in application to macromolecular bead models. J. Fluid Mech. 741. https://doi.org/10.1017/jfm.2013.668. http://journals.cambridge.org/article_S002211201300668X (2014)

Download references

Acknowledgements

We thank Dhairya Malhotra and Alex Barnett for inspiring discussions, and thank Bryce Palmer for comments on the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen Yan.

Additional information

Communicated by: Michael O’Neil

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Advances in Computational Integral Equations

Guest Editors: Stephanie Chaillat, Adrianna Gillman, Per-Gunnar Martinsson, Michael O’Neil, Mary-Catherine Kropinski, Timo Betcke, Alex Barnett

Appendices

Appendix A: All kernels supported by the open-source STKFMM package

Table 12 All kernels currently supported by the STKFMM package. All are supported for open boundary conditions and for singly, doubly, and triply periodic boundary conditions.

Appendix B. Overhead of periodic boundary condition

In contrast to our previous implementation [24], we have merged our postfix of applying the ML operator to implement periodic boundary condition into the downward pass (LL and LT ) of running KIFMM itself, with the help of Dr. Malhotra. Therefore, the extra overhead of running KIFMM with periodicity compared to open boundary condition is significantly reduced. Figure 9 shows the comparison of tree construction time and FMM evaluation time for four different kernels under different boundary conditions. The overhead of implementing periodic boundary conditions is usually 5% of the evaluation time with free boundary condition.

Fig. 9
figure 9

Timing results for selected kernels with different periodic boundary conditions. None, PX, PXY, and PXYZ refer to open boundary, singly periodic, doubly periodic, and triply periodic, respectively

Appendix C. Note about our SIMD implementation

Although the benchmarks presented in Section 3 were on CPUs (Intel Skylake and Cascade Lake) that support AVX-512 instructions, our kernels are hand-optimized with AVX2 instructions. Since the AVX-512 instruction set is still evolving and new subsets of instructions are still being added, we plan to support AVX-512 in the future when the instruction set is more stable.

Appendix D. Parallel scaling

Detailed parallel scaling benchmarks and analysis are beyond the scope of this paper because we implemented our package based on the PVFMM library without modifying any of its advanced parallelization facilities. Since the parallel scaling of PVFMM has been thoroughly tested [9], we present only some brief test results here. We tested strong and weak scaling for the low-precision case m = 8 and the high-precision case m = 12. The scaling benchmarks were measured on a cluster where each node was equipped with two Intel Cascade Lake 8268 CPUs @ 2.9 GHz. Each CPU had 24 cores, hyper-threading was disabled, and the machines were set to “performance” mode. We launched two MPI ranks on each CPU. Each MPI rank launched 12 OpenMP threads and each thread was bound to one CPU core. The number of source and target points was increased compared to the convergence tests in the previous sections, and we increased the box size to L = 100. We again used a non-uniform distribution of points picked from the log-normal distribution in Eq. (36).

We tested strong scaling using the RPY kernel with a doubly periodic boundary condition and 2.7 × 107 target points and source points. Figure 10a shows the results, which are comparable to 70% parallel efficiency (dashed black line). We tested weak scaling using the StokesPVel kernel GP with 4 million target points per node, i.e., 1 million points for each MPI rank. As shown in Fig. 10b, the trun results are again comparable to 70% parallel efficiency. These results are completely determined by the PVFMM library, and are close to its benchmarks [9].

Fig. 10
figure 10

Strong and weak scaling benchmarks for our KAFMM package. Each node has 48 cores. The left panel shows strong scaling results for a fixed-size RPY tensor problem with 2.7 × 107 source and target points, with doubly periodic (DP) boundary conditions. The right panel shows weak scaling results for a variable-size StokesPVel problem with 4 × 106 points per node, with non-periodic (NP) boundary conditions. In both cases the dashed line is a reference line of 70% parallel efficiency

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, W., Blackwell, R. Kernel aggregated fast multipole method. Adv Comput Math 47, 69 (2021). https://doi.org/10.1007/s10444-021-09896-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10444-021-09896-1

Keywords

Navigation