FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference,arXiv - CS - Performance

当前位置： X-MOL 学术 › arXiv.cs.PF › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference
arXiv - CS - Performance Pub Date : 2021-01-13 , DOI: arxiv-2101.05615
Daya Khudia, Jianyu Huang, Protonu Basu, Summer Deng, Haixin Liu, Jongsoo Park, Mikhail Smelyanskiy

Deep learning models typically use single-precision (FP32) floating point data types for representing activations and weights, but a slew of recent research work has shown that computations with reduced-precision data types (FP16, 16-bit integers, 8-bit integers or even 4- or 2-bit integers) are enough to achieve same accuracy as FP32 and are much more efficient. Therefore, we designed fbgemm, a high-performance kernel library, from ground up to perform high-performance quantized inference on current generation CPUs. fbgemm achieves efficiency by fusing common quantization operations with a high-performance gemm implementation and by shape- and size-specific kernel code generation at runtime. The library has been deployed at Facebook, where it delivers greater than 2x performance gains with respect to our current production baseline.

中文翻译：

FBGEMM：启用高性能，低精度深度学习推理

深度学习模型通常使用单精度（FP32）浮点数据类型来表示激活和权重，但是大量最新研究表明，使用精度降低的数据类型（FP16、16位整数，8位整数）进行的计算甚至4位或2位整数）足以达到与FP32相同的精度，并且效率更高。因此，我们从头开始设计了高性能内核库fbgemm，以在当前CPU上执行高性能量化推理。fbgemm通过将常见的量化操作与高性能gemm实现融合在一起，并在运行时生成特定于形状和大小的内核代码，从而提高了效率。该库已部署在Facebook上，相对于我们当前的生产基准，该库的性能提高了2倍以上。

更新日期：2021-01-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>