当前位置:
X-MOL 学术
›
arXiv.cs.PF
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference
arXiv - CS - Performance Pub Date : 2021-01-13 , DOI: arxiv-2101.05615 Daya Khudia, Jianyu Huang, Protonu Basu, Summer Deng, Haixin Liu, Jongsoo Park, Mikhail Smelyanskiy
arXiv - CS - Performance Pub Date : 2021-01-13 , DOI: arxiv-2101.05615 Daya Khudia, Jianyu Huang, Protonu Basu, Summer Deng, Haixin Liu, Jongsoo Park, Mikhail Smelyanskiy
Deep learning models typically use single-precision (FP32) floating point
data types for representing activations and weights, but a slew of recent
research work has shown that computations with reduced-precision data types
(FP16, 16-bit integers, 8-bit integers or even 4- or 2-bit integers) are enough
to achieve same accuracy as FP32 and are much more efficient. Therefore, we
designed fbgemm, a high-performance kernel library, from ground up to perform
high-performance quantized inference on current generation CPUs. fbgemm
achieves efficiency by fusing common quantization operations with a
high-performance gemm implementation and by shape- and size-specific kernel
code generation at runtime. The library has been deployed at Facebook, where it
delivers greater than 2x performance gains with respect to our current
production baseline.
中文翻译:
FBGEMM:启用高性能,低精度深度学习推理
深度学习模型通常使用单精度(FP32)浮点数据类型来表示激活和权重,但是大量最新研究表明,使用精度降低的数据类型(FP16、16位整数,8位整数)进行的计算甚至4位或2位整数)足以达到与FP32相同的精度,并且效率更高。因此,我们从头开始设计了高性能内核库fbgemm,以在当前CPU上执行高性能量化推理。fbgemm通过将常见的量化操作与高性能gemm实现融合在一起,并在运行时生成特定于形状和大小的内核代码,从而提高了效率。该库已部署在Facebook上,相对于我们当前的生产基准,该库的性能提高了2倍以上。
更新日期:2021-01-15
中文翻译:
FBGEMM:启用高性能,低精度深度学习推理
深度学习模型通常使用单精度(FP32)浮点数据类型来表示激活和权重,但是大量最新研究表明,使用精度降低的数据类型(FP16、16位整数,8位整数)进行的计算甚至4位或2位整数)足以达到与FP32相同的精度,并且效率更高。因此,我们从头开始设计了高性能内核库fbgemm,以在当前CPU上执行高性能量化推理。fbgemm通过将常见的量化操作与高性能gemm实现融合在一起,并在运行时生成特定于形状和大小的内核代码,从而提高了效率。该库已部署在Facebook上,相对于我们当前的生产基准,该库的性能提高了2倍以上。