High-performance SIMD modular arithmetic for polynomial evaluation,Concurrency and Computation: Practice and Experience

当前位置： X-MOL 学术 › Concurr. Comput. Pract. Exp. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

High-performance SIMD modular arithmetic for polynomial evaluation
Concurrency and Computation: Practice and Experience ( IF 2 ) Pub Date : 2021-05-25 , DOI: 10.1002/cpe.6270
Pierre Fortin _{1,

2} , Ambroise Fleury ₂ , François Lemaire ₂ , Michael Monagan ₃

Affiliation

Two essential problems in computer algebra, namely polynomial factorization and polynomial greatest common divisor computation, can be efficiently solved thanks to multiple polynomial evaluations in two variables using modular arithmetic. In this article, we focus on the efficient computation of such polynomial evaluations on one single CPU core. We first show how to leverage SIMD (single instruction, multiple data) computing for modular arithmetic on AVX2 and AVX-512 units, using both intrinsics and OpenMP compiler directives. Then we manage to increase the operational intensity and to exploit instruction-level parallelism in order to increase the compute efficiency of these polynomial evaluations. All this results in the end to performance gains up to about 5x on AVX2 and 10x on AVX-512.

中文翻译：

用于多项式评估的高性能 SIMD 模块化算法

计算机代数中的两个基本问题，即多项式因式分解和多项式最大公约数计算，可以通过使用模运算对两个变量的多项多项式求值有效地解决。在本文中，我们专注于在单个 CPU 内核上高效计算此类多项式评估。我们首先展示了如何利用 SIMD（单指令、多数据）计算在 AVX2 和 AVX-512 单元上进行模块化算术，同时使用内在函数和 OpenMP 编译器指令。然后我们设法增加操作强度并利用指令级并行性以提高这些多项式评估的计算效率。所有这些最终导致 AVX2 上的性能提升高达约 5 倍，AVX-512 上的性能提升高达 10 倍。

更新日期：2021-07-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>