Lobachevskii Journal of Mathematics ( IF 0.8 ) Pub Date : 2021-02-04 , DOI: 10.1134/s1995080220120331 G. I. Savin , B. M. Shabanov , A. A. Rybakov , S. S. Shumilin
Abstract
Widespread application of supercomputer technologies in various spheres of life, as well as the need of high-performance calculations allows us to speak about the relevance of the problem of increasing the performance of computer codes on supercomputers of modern architectures. Vectorization of program code is a low-level optimization that can, with a relatively local and compact application, increase the productivity of computational codes by several times. Modern Intel microprocessors have support for a unique set of instructions AVX-512, which, due to its features, allows you to vectorize almost any kind of code written in a predicate form. A set of simple restrictions when developing programs along with vectorization tools to enable the use of the AVX-512 instruction set can significantly speed up the resulting program. The article discusses approaches to vectorization of flat loops—a special-purpose program context, the successful vectorization of which allows to increase the productivity of supercomputer applications even for such program code for which optimizing compilers are powerless.
中文翻译:
使用指令AVX-512对任意结构的平环进行矢量化
摘要
超级计算机技术在生活的各个领域中的广泛应用,以及对高性能计算的需求,使我们可以说出在现代体系结构的超级计算机上提高计算机代码性能的问题的相关性。程序代码的矢量化是一个低级优化,可以在相对本地且紧凑的应用程序中将计算代码的生产率提高数倍。现代的英特尔微处理器支持一组独特的指令AVX-512,由于其功能,它使您可以向量化几乎所有以谓词形式编写的代码。在开发程序时使用了一组简单的限制,以及使用矢量化工具来启用AVX-512指令集的使用,可以显着加快生成程序的速度。