Accelerating number theoretic transform in GPU platform for fully homomorphic encryption,The Journal of Supercomputing

当前位置： X-MOL 学术 › J. Supercomput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Accelerating number theoretic transform in GPU platform for fully homomorphic encryption
The Journal of Supercomputing ( IF 2.5 ) Pub Date : 2020-05-18 , DOI: 10.1007/s11227-020-03156-7
Jia-Zheng Goey , Wai-Kong Lee , Bok-Min Goi , Wun-She Yap

In scientific computing and cryptography, there are many applications that involve large integer multiplication, which is a time-consuming operation. To reduce the computational complexity, number theoretic transform is widely used, wherein the multiplication can be performed in the frequency domain with reduced complexity. However, the speed performance of large integer multiplication is still not satisfactory if the operand size is very large (e.g., more than 100K-bit). In view of that, several researchers had proposed to accelerate the implementation of number theoretic transform using massively parallel GPU architecture. In this paper, we proposed several techniques to improve the performance of number theoretic transform implementation, which is faster than the state-of-the-art work by Dai et al. The proposed techniques include register-based twiddle factors storage and multi-stream asynchronous computation, which leverage on the features offered in new GPU architectures. The proposed number theoretic transform implementation was applied to CMNT fully homomorphic encryption scheme proposed by Coron et al. With the proposed implementation technique, homomorphic multiplications in CMNT take 0.27 ms on GTX1070 desktop GPU and 7.49 ms in Jetson TX1 embedded system, respectively. This shows that the proposed implementation is suitable for practical applications in server environment as well as embedded system.

中文翻译：

加速 GPU 平台中的数论变换以实现全同态加密

在科学计算和密码学中，有很多应用涉及大整数乘法，这是一项耗时的运算。为了降低计算复杂度，广泛使用数论变换，其中乘法可以在频率域中以降低的复杂度进行。但是，如果操作数非常大（例如超过100K 位），大整数乘法的速度性能仍然不能令人满意。有鉴于此，一些研究人员提出使用大规模并行 GPU 架构来加速数论变换的实现。在本文中，我们提出了几种技术来提高数论变换实现的性能，这比 Dai 等人的最新成果要快。所提出的技术包括基于寄存器的旋转因子存储和多流异步计算，它们利用了新 GPU 架构中提供的功能。提出的数论变换实现被应用于由 Coron 等人提出的 CMNT 全同态加密方案。通过所提出的技术实现，CMNT 中的同态乘法在 GTX1070 桌面 GPU 上需要 0.27 毫秒，在 Jetson TX1 嵌入式系统中分别需要 7.49 毫秒。这表明所提出的实现适用于服务器环境以及嵌入式系统中的实际应用。提出的数论变换实现被应用于由 Coron 等人提出的 CMNT 全同态加密方案。通过所提出的技术实现，CMNT 中的同态乘法在 GTX1070 桌面 GPU 上需要 0.27 毫秒，在 Jetson TX1 嵌入式系统中分别需要 7.49 毫秒。这表明所提出的实现适用于服务器环境以及嵌入式系统中的实际应用。提出的数论变换实现被应用于由 Coron 等人提出的 CMNT 全同态加密方案。通过所提出的技术实现，CMNT 中的同态乘法在 GTX1070 桌面 GPU 上需要 0.27 毫秒，在 Jetson TX1 嵌入式系统中分别需要 7.49 毫秒。这表明所提出的实现适用于服务器环境以及嵌入式系统中的实际应用。

更新日期：2020-05-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文