Abstract
Many different simulation methods for Stokes flow problems involve a common computationally intense task—the summation of a kernel function over O(N2) pairs of points. One popular technique is the kernel independent fast multipole method (KIFMM), which constructs a spatial adaptive octree for all points and places a small number of equivalent multipole and local equivalent points around each octree box, and completes the kernel sum with O(N) cost, using these equivalent points. Simpler kernels can be used between these equivalent points to improve the efficiency of KIFMM. Here we present further extensions and applications to this idea, to enable efficient summations and flexible boundary conditions for various kernels. We call our method the kernel aggregated fast multipole method (KAFMM), because it uses different kernel functions at different stages of octree traversal. We have implemented our method as an open-source software library STKFMM based on the high-performance library PVFMM, with support for Laplace kernels, the Stokeslet, regularized Stokeslet, Rotne-Prager-Yamakawa (RPY) tensor, and the Stokes double-layer and traction operators. Open and periodic boundary conditions are supported for all kernels, and the no-slip wall boundary condition is supported for the Stokeslet and RPY tensor. The package is designed to be ready-to-use as well as being readily extensible to additional kernels.
Similar content being viewed by others
References
Cortez, R.: Regularized Stokeslet segments. J. Comput. Phys. 375, 783–796 (2018). https://doi.org/10.1016/j.jcp.2018.08.055
Cortez, R., Fauci, L., Medovikov, A.: The method of regularized Stokeslets in three dimensions: Analysis, validation, and application to helical swimming. Phys. Fluids (1994-present) 17(3) 031, 504 (2005). https://doi.org/10.1063/1.1830486
Greengard, L., Rokhlin, V.: A fast algorithm for particle simulations. J. Comput. Phys. 73(2), 325–348 (1987). https://doi.org/10.1016/0021-9991(87)90140-9
Guan, W., Cheng, X., Huang, J., Huber, G., Li, W., McCammon, J. A., Zhang, B.: RPYFMM: Parallel adaptive fast multipole method for Rotne–Prager–Yamakawa tensor in biomolecular hydrodynamics simulations. Comput. Phys. Commun. 227, 99–108 (2018). https://doi.org/10.1016/j.cpc.2018.02.005
Klinteberg, L.A., Shamshirgar, D.S, Tornberg, A.K: Fast Ewald summation for free-space Stokes potentials. Res. Math. Sci. 4(1), 1 (2017). https://doi.org/10.1186/s40687-016-0092-7
LaGrone, J., Cortez, R., Yan, W., Fauci, L.: Complex dynamics of long, flexible fibers in shear. J. Non-Newtonian Fluid Mech. 269, 73–81 (2019). https://doi.org/10.1016/j.jnnfm.2019.06.007
Liang, Z., Gimbutas, Z., Greengard, L., Huang, J., Jiang, S.: A fast multipole method for the Rotne–Prager–Yamakawa tensor and its applications. J. Comput. Phys. 234(Supplement C), 133–139 (2013). https://doi.org/10.1016/j.jcp.2012.09.021
Lindbo, D., Tornberg, A. K.: Fast and spectrally accurate Ewald summation for 2-periodic electrostatic systems. J. Chem. Phys. 136(16), 164,111 (2012). https://doi.org/10.1063/1.4704177
Malhotra, D., Biros, G.: PVFMM: A Parallel Kernel Independent FMM for Particle and Volume Potentials. Commun. Comput. Phys. 18(03), 808–830 (2015). https://doi.org/10.4208/cicp.020215.150515sw
Mizerski, K. A., Wajnryb, E., Zuk, P. J., Szymczak, P.: The rotne-prager-yamakawa approximation for periodic systems in a shear flow. J. Chem. Phys. 140(18), 184,103 (2014). https://doi.org/10.1063/1.4871113
Olson, S. D., Lim, S., Cortez, R.: Modeling the dynamics of an elastic rod with intrinsic curvature and twist using a regularized Stokes formulation. J. Comput. Phys. 238, 169–187 (2013). https://doi.org/10.1016/j.jcp.2012.12.026
Rostami, M. W., Olson, S. D.: Kernel-independent fast multipole method within the framework of regularized Stokeslets. J. Fluids Struct. 67, 60–84 (2016). https://doi.org/10.1016/j.jfluidstructs.2016.07.006
Rotne, J., Prager, S.: Variational treatment of hydrodynamic interaction in polymers. J. Chem. Phys. 50(11), 4831–4837 (1969). https://doi.org/10.1063/1.1670977
Shamshirgar, D. S., Tornberg, A. K.: The spectral Ewald method for singly periodic domains. J. Comput. Phys. 347(Supplement C), 341–366 (2017). https://doi.org/10.1016/j.jcp.2017.07.001
Srinivasan, S., Tornberg, A. K.: Fast Ewald summation for Green’s functions of Stokes flow in a half-space. Res. Math. Sci. 5(3), 35 (2018). https://doi.org/10.1007/s40687-018-0153-1
Swan, J. W., Brady, J. F.: Anisotropic diffusion in confined colloidal dispersions: the evanescent diffusivity. J. Chem. Phys. 135(1), 14,701 (2011). https://doi.org/10.1063/1.3604530
Tornberg, A. K.: The Ewald sums for singly, doubly and triply periodic electrostatic systems. Adv. Comput. Math. 42(1), 227–248 (2015). https://doi.org/10.1007/s10444-015-9422-3
Wajnryb, E., Mizerski, K. A., Zuk, P. J., Szymczak, P.: Generalization of the Rotne–Prager–Yamakawa mobility and shear disturbance tensors. J. Fluid Mech., 731. https://doi.org/10.1017/jfm.2013.402 (2013)
Wang, L.: A Kernel-Independent Treecode for General Rotne-Prager-Yamakawa Tensor. Adv. Appl. Math. Mech. 13(2), 296–310 (2021). https://doi.org/10.4208/aamm.OA-2019-0322
Wang, M., Brady, J. F.: Spectral Ewald Acceleration of Stokesian Dynamics for polydisperse suspensions. J. Comput. Phys. 306, 443–477 (2016). https://doi.org/10.1016/j.jcp.2015.11.042
Yamakawa, H.: Transport properties of polymer chains in dilute solution: Hydrodynamic interaction. J. Chem. Phys. 53(1), 436–443 (1970). https://doi.org/10.1063/1.1673799
Yan, W., Brady, J. F.: The behavior of active diffusiophoretic suspensions: An accelerated Laplacian dynamics study. J. Chem. Phys. 145(13), 134,902 (2016). https://doi.org/10.1063/1.4963722
Yan, W., Corona, E., Malhotra, D., Veerapaneni, S., Shelley, M.: A scalable computational platform for particulate Stokes suspensions. J. Comput. Phys. 416(109), 524 (2020). https://doi.org/10.1016/j.jcp.2020.109524
Yan, W., Shelley, M.: Flexibly imposing periodicity in kernel independent fmm: a multipole-to-local operator approach. J. Comput. Phys. 355, 214–232 (2018). https://doi.org/10.1016/j.jcp.2017.11.012
Yan, W., Shelley, M.: Universal image systems for non-periodic and periodic Stokes flows above a no-slip wall. J. Comput. Phys. 375, 263–270 (2018). https://doi.org/10.1016/j.jcp.2018.08.041
Ying, L., Biros, G., Zorin, D.: A kernel-independent adaptive fast multipole algorithm in two and three dimensions. J. Comput. Phys. 196 (2), 591–626 (2004). https://doi.org/10.1016/j.jcp.2003.11.021
Zuk, P.J., Wajnryb, E., Mizerski, K.A., Szymczak, P.: Rotne-prager-yamakawa approximation for different-sized particles in application to macromolecular bead models. J. Fluid Mech. 741. https://doi.org/10.1017/jfm.2013.668. http://journals.cambridge.org/article_S002211201300668X (2014)
Acknowledgements
We thank Dhairya Malhotra and Alex Barnett for inspiring discussions, and thank Bryce Palmer for comments on the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Michael O’Neil
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Advances in Computational Integral Equations
Guest Editors: Stephanie Chaillat, Adrianna Gillman, Per-Gunnar Martinsson, Michael O’Neil, Mary-Catherine Kropinski, Timo Betcke, Alex Barnett
Appendices
Appendix A: All kernels supported by the open-source STKFMM package
Appendix B. Overhead of periodic boundary condition
In contrast to our previous implementation [24], we have merged our postfix of applying the M → L operator to implement periodic boundary condition into the downward pass (L → L and L → T ) of running KIFMM itself, with the help of Dr. Malhotra. Therefore, the extra overhead of running KIFMM with periodicity compared to open boundary condition is significantly reduced. Figure 9 shows the comparison of tree construction time and FMM evaluation time for four different kernels under different boundary conditions. The overhead of implementing periodic boundary conditions is usually 5% of the evaluation time with free boundary condition.
Appendix C. Note about our SIMD implementation
Although the benchmarks presented in Section 3 were on CPUs (Intel Skylake and Cascade Lake) that support AVX-512 instructions, our kernels are hand-optimized with AVX2 instructions. Since the AVX-512 instruction set is still evolving and new subsets of instructions are still being added, we plan to support AVX-512 in the future when the instruction set is more stable.
Appendix D. Parallel scaling
Detailed parallel scaling benchmarks and analysis are beyond the scope of this paper because we implemented our package based on the PVFMM library without modifying any of its advanced parallelization facilities. Since the parallel scaling of PVFMM has been thoroughly tested [9], we present only some brief test results here. We tested strong and weak scaling for the low-precision case m = 8 and the high-precision case m = 12. The scaling benchmarks were measured on a cluster where each node was equipped with two Intel Cascade Lake 8268 CPUs @ 2.9 GHz. Each CPU had 24 cores, hyper-threading was disabled, and the machines were set to “performance” mode. We launched two MPI ranks on each CPU. Each MPI rank launched 12 OpenMP threads and each thread was bound to one CPU core. The number of source and target points was increased compared to the convergence tests in the previous sections, and we increased the box size to L = 100. We again used a non-uniform distribution of points picked from the log-normal distribution in Eq. (36).
We tested strong scaling using the RPY kernel with a doubly periodic boundary condition and 2.7 × 107 target points and source points. Figure 10a shows the results, which are comparable to 70% parallel efficiency (dashed black line). We tested weak scaling using the StokesPVel kernel GP with 4 million target points per node, i.e., 1 million points for each MPI rank. As shown in Fig. 10b, the trun results are again comparable to 70% parallel efficiency. These results are completely determined by the PVFMM library, and are close to its benchmarks [9].
Rights and permissions
About this article
Cite this article
Yan, W., Blackwell, R. Kernel aggregated fast multipole method. Adv Comput Math 47, 69 (2021). https://doi.org/10.1007/s10444-021-09896-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10444-021-09896-1