Abstract
A basic heterogeneous parallel Red–Black successive over-relaxation (SOR) implement, the mono-color floating-point scheme, was developed on graphics processing units (GPU) with OpenCL platform. Designed in fine granularity, compact data structure, and stencil function, a concise mapping relationship was created to implicitly describe the complex rules for searching neighbor elements, which could avoid low utilization of GPU in the traditional scheme of Red–Black SOR. The new mono-color floating-point scheme was applied to build fast Semi-Implicit Method for Pressure Linked Equations (SIMPLE) solver with OpenCL and OpenMP on the heterogeneous parallel computing device. Compared with SIMPLE solver in the traditional Red–Black SOR scheme, the new scheme can achieve 1.7 to 1.8 faster accelerative performance on the same GPU. And this scheme can eliminate the complex searching module in mono-color logical scheme and behave better than the mono-color logical scheme by 20–30% acceleration. Numerical cases in double precision showed that SIMPLE solver on GPU with new scheme of Red–Black SOR could save up to 92% computing time compared with the serial solver on CPU.
Similar content being viewed by others
Abbreviations
- Bi, Ri :
-
The black element, the red element
- C :
-
Specific heat (J/kg K)
- CB, MB, SB :
-
Coefficients for black elements
- CR, MR, SR :
-
Coefficients for red elements
- Ek, Wk, Nk, Sk :
-
Boundary elements
- g :
-
Gravitational acceleration (m/s2)
- g :
-
Thermal conductivity W (m K)−1
- L :
-
Length (m)
- L5, si :
-
The stencil function
- M, N, n :
-
Size of the computational matrix
- Nu:
-
Nusselt number
- Nux0 :
-
Local Nusselt number
- Numean :
-
Mean Nusselt number
- p :
-
Pressure (Pa)
- Pr:
-
Prandtl number
- Ra:
-
Rayleigh number
- Re:
-
Reynolds number
- S :
-
Source item
- T :
-
Temperature (K)
- T C :
-
Temperature of cold wall (K)
- T H :
-
Temperature of hot wall (K)
- T m :
-
Reference temperature (K)
- T * :
-
Dimensionless temperature
- t :
-
Time (s)
- U :
-
Velocity vector
- u :
-
Velocity in the coordinate x (m s−1)
- u * :
-
Dimensionless velocity of u
- v :
-
Velocity in the coordinate y (m s−1)
- x, y :
-
Cartesian coordinate (m)
- β :
-
Coefficient of thermal expansion (K−1)
- ρ :
-
Density (kg/m3)
- μ :
-
Dynamic viscosity (Pa s)
- ϕ :
-
Generic variable
- i :
-
Index
- E, W, N, S, NB :
-
Index of neighbor element
- P :
-
Index of central element
- C :
-
Cold wall
- H :
-
Hot wall
References
Trottenberg U, Oosterlee CW, Schuller A (2000) Multigrid. Elsevier, Orlando, pp 289–355
Karniadakis G, Sherwin S (1999) Spectral/hp element methods for computational fluid dynamics. Oxford University Press, New York, pp 238–268
Liesen J, Strakos Z (2012) Krylov subspace methods: principles and analysis. Oxford University Press, Oxford, pp 12–70
Roy P, Anand NK, Donzis D (2015) A parallel multigrid finite-volume solver on a collocated grid for incompressible Navier–Stokes equations. Numer Heat Transf B 67(5):376–409
Munshi A, Gaster BR, Mattson TG, Fung J, Ginsburg D (2011) OpenCL programming guide. Addison-Wesley, New York, pp 3–36
Kandrot E, Sanders J (2010) CUDA by example: an introduction to general-purpose GPU programming. Addison-Wesley Longman, Amsterdam, pp 9–10
Niemeyer KE, Sung CJ (2014) Recent progress and challenges in exploiting graphics processors in computational fluid dynamics. J Supercomput 67(2):528–564
Xian W, Takayuki A (2011) Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster. Parallel Comput 37(9):521–535
Anderson JA, Jankowski E, Grubb TL, Engel M, Glotzer SC (2013) Massively parallel monte carlo for many-particle simulations on GPUs. J Comput Phys 254(12):27–38
Yang J, Wang Y, Chen Y (2007) GPU accelerated molecular dynamics simulation of thermal conductivities. J Comput Phys 221(2):799–804
SP. Vanka, AF. Shinn, KC. Sahu, Computational Fluid Dynamics Using Graphics Processing Units: Challenges and Opportunities, In: Proceedings of the ASME 2011 international mechanical engineering congress and exposition, ASME, Denver, Colorado, USA, 2011, pp 429–437
Kindratenko V (2014) Numerical computations with GPUs. Springer International Publishing, Switzerland, pp 125–338
Zhang Y, Cohen J, Owens JD (2010) Fast tridiagonal solvers on the GPU. ACM, Bangalore, India, ACM Sigplan Symposium on Principles and Practice of Parallel Programming, pp 127–136
J. Williams, C. Sarofeen, H. Shan, M. Conley, An accelerated iterative linear solver with gpus for cfd calculations of unstructured grids, The International Conference on Computational Science, ICCS 2016, Procedia Computer Science, San Diego, California, USA, 2016, pp 1291–1300
Thibault J, Senocak I (2009) CUDA implementation of a Navier–Stokes solver on multi-GPU desktop platforms for incompressible flows. In: 47th AIAA aerosp. orlando, American Institute of Aeronautics and Astronautics, Florida, USA, Sciences. Meeting. Including. New Horizons Forum Aerosp. Expo., p 758
Adams L, Ortega JM (1982) A multicolor SOR method for parallel computation. Icpp 8(3):23–28
Itu LM, Suciu C, Moldoveanu F, Postelnicu A (2011) GPU optimized computation of stencil based algorithms, 2011 RoEduNet In: International Conference 10th Edition: Networking in Education and Research. IEEE, Iasi, Romania, pp 1–6
Liu JT, Ma ZS, Li SH, Zhao Y (2011) A GPU accelerated Red–Black SOR algorithm for computational fluid dynamics problems. Adv Mater Res 320:335–340
A. Vizitiu, L. Itu, C. Niţă, C. Suciu, Optimized three-dimensional stencil computation on Fermi and Kepler GPUs, High Performance Extreme Computing Conference, IEEE, Waltham, Massachusetts, USA, 2015, pp 1–6
Elmaghrbay M, Ammar R, Rajasekaran S (2014) Fast GPU algorithms for implementing the Red–Black Gauss-Seidel method for solving partial differential equations. In: 2013 IEEE Symposium on computers and communications, IEEE, Split, Croatia, pp 000269–000274
Cotronis Y, Konstantinidis E, Louka MA, Missirlis NM (2014) A comparison of CPU and GPU implementations for solving the convection diffusion equation using the local modified SOR method. Parallel Comput 40(7):173–185
Wan F, Yin Y, Zhang S (2018) 3D parallel multigrid methods for real-time fluid simulation. 3D Res 9(1):8
Fernandez G, Mendina M, Usera G (2020) Heterogeneous Computing (CPU–GPU) for pollution dispersion in an urban environment. Computation 8(1):3
Konstantinidis E, Cotronis Y (2013) Graphics processing unit acceleration of the red/black SOR method. Concurr Comput Pract Exp 25(8):1107–1120
Patankar SV, Spalding DB (1972) A calculation procedure for heat, mass and momentum transfer in three-dimensional parabolic flows. Int J Heat Mass Transf 15(10):1787–1806
Emans M, Liebmann M (2013) Velocity–pressure coupling on GPUs. Comput 95(1):123–143
Shinn AF, Vanka SP (2009) Implementation of a semi-implicit pressure-based multigrid fluid flow algorithm on a graphics processing unit, ASME 2009 international mechanical engineering congress and exposition. ASME, Lake Buena Vista, Fla, USA 13:125–133
Xiang Y, Yu B, Yuan Q, Sun DL (2017) GPU Acceleration of CFD algorithm: HSMAC and SIMPLE. Procedia Comput Sci 108:1982–1989
Patankar SV (1980) Numerical heat transfer and fluid flow. Hemisphere Pub. Corp, Washington, D.C., pp 113–135
Ghia U, Ghia KN, Shin CT (1982) High-Re solutions for incompressible flow using the Navier-Stokes equations and a multigrid method. J Comput Phys 48(3):387–411
Barakos G, Mitsoulis E, Assimacopoulos D (1994) Natural convection flow in a square cavity revisited: laminar and turbulent models with wall functions. Int J Numer Methods Fluids 18(7):695–719
Davis GDV (1983) Natural convection of air in a square cavity: a bench mark numerical solution. Int J Numer Methods Fluids 3(3):249–264
Acknowledgements
This work was supported by the National Nature Science Foundation of China (No.51276199).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, R., Gong, L. & Xu, M. A heterogeneous parallel Red–Black SOR technique and the numerical study on SIMPLE. J Supercomput 76, 9585–9608 (2020). https://doi.org/10.1007/s11227-020-03221-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03221-1