Evaluating the performance of FFT library implementations on modern hybrid computing systems

Malkovsky, Sergey I.; Sorokin, Aleksei A.; Tsoy, Georgiy I.; Korolev, Sergey P.; Smagin, Sergey I.; Kondrashev, Vadim A.

doi:10.1007/s11227-020-03591-6

Evaluating the performance of FFT library implementations on modern hybrid computing systems

Published: 20 January 2021

Volume 77, pages 8326–8354, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

373 Accesses
4 Citations
Explore all metrics

Abstract

Fast Fourier transform is widely used to solve numerous scientific and engineering problems. In particular, this transform is behind the software dealing with speech and image recognition, signal analysis, modeling of properties of new materials and substances, etc. Newly emerging high-performance hybrid computing systems, as well as systems with alternative architectures, require research on discrete Fourier transform computation efficiency on these new platforms. The results of such research allow assessing the feasibility of certain solutions for building modern computing and data processing centers. This paper presents the results of such research covering modern hybrid computing systems based on the IBM POWER and Intel Xeon processors, as well as on NVIDIA Tesla co-processors. The analysis is carried out, and conclusions are presented on their performance when executing fast Fourier transforms. The impact of the existing architectural aspects of the hardware (CPU simultaneous multithreading mode, GPU data transfer bus, etc.) on the transform performance efficiency is assessed. The obtained results are used to provide recommendations on the optimal operation modes and settings of the considered mathematical libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalability Issues in FFT Computation

heFFTe: Highly Efficient FFT for Exascale

Hybrid and 4-D FFT implementations of an open-source parallel FFT package OpenFFT

Article 14 December 2015

Truong Vinh Truong Duy & Taisuke Ozaki

References

Brodtkorb AR, Dyken C, Hagen TR, Hjelmervik JM, Storaasli OO (2010) State-of-the-art in heterogeneous computing. Sci Progr 18(1):1–33. https://doi.org/10.1155/2010/540159
Article Google Scholar
Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19(90):297–301
Article MathSciNet Google Scholar
Stanković D, Jovanović P, Jović A, Slavnić V, Vudragović D, Balaž A (2014) Implementation and Benchmarking of New FFT Libraries in Quantum ESPRESSO. In: Dulea M, Karaivanova A, Oulas A, Liabotis I, Stojiljkovic D, Prnjat O (eds) High-Performance Computing Infrastructure for South East Europe’s Research Communities, Modeling and Optimization in Science and Technologies, vol 2. Springer, Cham. https://doi.org/10.1007/978-3-319-01520-0_19
Wende F, Marsman M, Steinke T (2016) On Enhancing 3D-FFT Performance in VASP. In: CUG proceedings
Bailey DH, Barszcz E, Barton JT, Browning DS, Carter RL, Dagum L, Fatoohi RA, Frederickson PO, Lasinski TA, Schreiber RS, Simon HD, Venkatakrishnan V, Weeratunga SK (1991) The Nas parallel benchmarks. Int J Supercomput Appl 5(3):63–73. https://doi.org/10.1177/109434209100500306
Article Google Scholar
Luszczek P, Dongarra J, Koester D, Rabensiefner R, Lucas B, Kepner J, McCalpin J, Bailey D, Takahashi D (2005) Introduction to the HPC Challenge Benchmark Suite. Lawrence Berkeley National Laboratory. Paper LBNL-57493, 12p
Park Y-S, Park K-R, Kim J-M, Jeong H-Y (2017) Fast Fourier transform benchmark on X86 Xeon system for multimedia data processing. Multimed Tools Appl 76(4):6015–6030. https://doi.org/10.1007/s11042-015-2843-7
Article Google Scholar
Jodra JL, Gurrutxaga I, Muguerza J (2015) A study of memory consumption and execution performance of the cufft library. In: 2015 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC). IEEE, pp 323–327. https://doi.org/10.1109/3PGCIC.2015.66
Střelák D, Filipovič J (2018) Performance analysis and autotuning setup of the cuFFT library. In: Proceedings of the 2nd Workshop on Autotuning and Adaptivity Approaches for Energy Efficient HPC Systems—ANDARE ’18. ACM Press, New York, pp 1–6. https://doi.org/10.1145/3295816.3295817
Govindaraju NK, Lloyd B, Dotsenko Y, Smith B, Manferdelli J (2008) High performance discrete Fourier transforms on graphics processors. In: 2008 SC—International Conference for High Performance Computing, Networking, Storage and Analysis IEEE, pp 1-12. https://doi.org/10.1109/SC.2008.5213922
Smagin SI, Sorokin AA, Malkovsky SI, Korolev SP, Lukyanova OA, Nikitin OY, Kondrashev VA, Chernykh VY (2019) The organization of effective multi-user operation of hybrid computing systems. Comput Technol 5(24):49–60. https://doi.org/10.25743/ICT.2019.24.5.005
Article Google Scholar
Mal’kovskii SI, Sorokin AA, Korolev SP, Zatsarinnyi AA, Tsoi GI (2019) Performance evaluation of a hybrid computer cluster built on IBM POWER8 microprocessors. Progr Comput Softw 45:324–332. https://doi.org/10.1134/S0361768819060057
Article Google Scholar
Sorokin A, Malkovsky S, Tsoy G, Zatsarinnyy A, Volovich K (2020) Comparative performance evaluation of modern heterogeneous. High-performance computing systems CPUs. Electronics 9(6):1035. https://doi.org/10.3390/electronics9061035
Article Google Scholar
ESSL Guide and Reference, IBM (2019). https://www.ibm.com/support/knowledgecenter/SSFHY8_6.2/reference/essl_reference_pdf.pdf. Accessed 17 Aug 2020
Frigo M, Johnson SG (2005) The design and implementation of FFTW3. Proc IEEE 93(2):216–231. https://doi.org/10.1109/JPROC.2004.840301
Article Google Scholar
Sinharoy B, Van Norstrand JA, Eickemeyer RJ, Le HQ, Leenstra J, Nguyen DQ, Konigsburg B, Ward K, Brown MD, Moreira JE, Levitan D, Tung S, Hrusecky D, Bishop JW, Gschwind M, Boersma M, Kroener M, Kaltenbach M, Karkhanis T, Fernsler KM (2015) IBM POWER8 processor core microarchitecture. IBM J Res Dev 59(1):2:1–2:21
Article Google Scholar
Sadasivam SK, Thompto BW, Kalla R, Starke WJ (2017) IBM Power9 processor architecture. IEEE Micro 37:40–51
Article Google Scholar
NVidia: CUDA Toolkit documentation: cuFFT (2019). https://docs.nvidia.com/cuda/cufft/index.html. Accessed 01 Aug 2020
Foley D, Danskin J (2017) Ultra-Performance Pascal GPU and NVLink Interconnect. IEEE Micro 37(2):7–17. https://doi.org/10.1109/MM.2017.37
Article Google Scholar
Choquette J, Giroux O, Foley D (2018) Volta: performance and programmability. IEEE Micro 38(2):42–52. https://doi.org/10.1109/MM.2018.022071134
Article Google Scholar
Mulnix D (2017) Intel Xeon processor scalable family technical overview. https://software.intel.com/ru-ru/articles/intel-xeon-processor-scalable-family-technical-overview. Accessed 01 Aug 2020
Wang E et al (2014) Intel math kernel library. In: High-performance computing on the Intel\(\textregistered\) Xeon \(\text{Phi}^{{\rm TM}}\). Springer, Cham, pp 167-188. https://doi.org/10.1007/978-3-319-06486-4_7
Eggers SJ, Emer JS, Levy HM, Lo JL, Stamm RL, Tullsen DM (1997) Simultaneous multithreading: a platform for next-generation processors. IEEE Micro 17(5):12–19. https://doi.org/10.1109/40.621209
Article Google Scholar
Starke WJ, Stuecheli J, Daly DM, Dodson JS, Auernhammer F, Sagmeister PM, Guthrie GL, Marino CF, Siegel M, Blaner B (2015) The cache and memory subsystems of the IBM POWER8 processor. IBM J Res Dev 59(1):3:1–3:13. https://doi.org/10.1147/JRD.2014.2376131
Article Google Scholar
Starke WJ, Dodson JS, Stuecheli J, Retter E, Michael BW, Powell SJ, Marcella JA (2018) IBM POWER9 memory architectures for optimized systems. IBM J Res Dev 62(4/5):3:1–3:13. https://doi.org/10.1147/JRD.2018.2846159
Article Google Scholar
Steinbach P, Werner M (2017) Gearshifft—the FFT benchmark suite for heterogeneous platforms. In: Kunkel J, Yokota R, Balaji P, Keyes D (eds) High performance computing. ISC 2017. Lecture notes in computer science, vol 10266. Springer, Cham, pp 199-216. https://doi.org/10.1007/978-3-319-58667-0_11
Sorokin AA, Makogonov SV, Korolev SP (2017) The information infrastructure for collective scientific work in the far east of Russia. Sci Tech Inf Proc 44:302–304. https://doi.org/10.3103/S0147688217040153
Article Google Scholar
Informatics Core Facility Statute. Available Online: http://www.frccsc.ru/ckp. Accessed 22 Jan 2020

Download references

Acknowledgements

This study used the computing resources and systems of the Shared Services Center ”Data Center of FEB RAS” (Khabarovsk) [27] and the Informatics Center of the Federal Research Center ”Computer Science and Control” of Russian Academy of Sciences (Moscow) [28].

Author information

Authors and Affiliations

Computing Center of Far Eastern Branch of Russian Academy of Sciences, 65 Kim U Chen Street, Khabarovsk, Russian Federation, 680000
Sergey I. Malkovsky, Aleksei A. Sorokin, Georgiy I. Tsoy, Sergey P. Korolev & Sergey I. Smagin
Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44-2 Vavilov Street, Moscow, Russian Federation, 119133
Vadim A. Kondrashev

Authors

Sergey I. Malkovsky
View author publications
You can also search for this author in PubMed Google Scholar
Aleksei A. Sorokin
View author publications
You can also search for this author in PubMed Google Scholar
Georgiy I. Tsoy
View author publications
You can also search for this author in PubMed Google Scholar
Sergey P. Korolev
View author publications
You can also search for this author in PubMed Google Scholar
Sergey I. Smagin
View author publications
You can also search for this author in PubMed Google Scholar
Vadim A. Kondrashev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergey P. Korolev.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was partly funded by Russian Foundation for Basic Research (RFBR), Project Number 18-29-03196.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Malkovsky, S.I., Sorokin, A.A., Tsoy, G.I. et al. Evaluating the performance of FFT library implementations on modern hybrid computing systems. J Supercomput 77, 8326–8354 (2021). https://doi.org/10.1007/s11227-020-03591-6

Download citation

Accepted: 23 December 2020
Published: 20 January 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s11227-020-03591-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating the performance of FFT library implementations on modern hybrid computing systems

Abstract

Access this article

Similar content being viewed by others

Scalability Issues in FFT Computation

heFFTe: Highly Efficient FFT for Exascale

Hybrid and 4-D FFT implementations of an open-source parallel FFT package OpenFFT

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluating the performance of FFT library implementations on modern hybrid computing systems

Abstract

Access this article

Similar content being viewed by others

Scalability Issues in FFT Computation

heFFTe: Highly Efficient FFT for Exascale

Hybrid and 4-D FFT implementations of an open-source parallel FFT package OpenFFT

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation