当前位置: X-MOL 学术arXiv.cs.PF › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference
arXiv - CS - Performance Pub Date : 2021-01-13 , DOI: arxiv-2101.05615
Daya Khudia, Jianyu Huang, Protonu Basu, Summer Deng, Haixin Liu, Jongsoo Park, Mikhail Smelyanskiy

Deep learning models typically use single-precision (FP32) floating point data types for representing activations and weights, but a slew of recent research work has shown that computations with reduced-precision data types (FP16, 16-bit integers, 8-bit integers or even 4- or 2-bit integers) are enough to achieve same accuracy as FP32 and are much more efficient. Therefore, we designed fbgemm, a high-performance kernel library, from ground up to perform high-performance quantized inference on current generation CPUs. fbgemm achieves efficiency by fusing common quantization operations with a high-performance gemm implementation and by shape- and size-specific kernel code generation at runtime. The library has been deployed at Facebook, where it delivers greater than 2x performance gains with respect to our current production baseline.

中文翻译:

FBGEMM:启用高性能,低精度深度学习推理

深度学习模型通常使用单精度(FP32)浮点数据类型来表示激活和权重,但是大量最新研究表明,使用精度降低的数据类型(FP16、16位整数,8位整数)进行的计算甚至4位或2位整数)足以达到与FP32相同的精度,并且效率更高。因此,我们从头开始设计了高性能内核库fbgemm,以在当前CPU上执行高性能量化推理。fbgemm通过将常见的量化操作与高性能gemm实现融合在一起,并在运行时生成特定于形状和大小的内核代码,从而提高了效率。该库已部署在Facebook上,相对于我们当前的生产基准,该库的性能提高了2倍以上。
更新日期:2021-01-15
down
wechat
bug