当前位置: X-MOL 学术IEEE Trans. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
New Flexible Multiple-Precision Multiply-Accumulate Unit for Deep Neural Network Training and Inference
IEEE Transactions on Computers ( IF 3.7 ) Pub Date : 2020-01-01 , DOI: 10.1109/tc.2019.2936192
Hao Zhang , Dongdong Chen , Seok-Bum Ko

In this paper, a new flexible multiple-precision multiply-accumulate (MAC) unit is proposed for deep neural network training and inference. The proposed MAC unit supports both fixed-point operations and floating-point operations. For floating-point format, the proposed unit supports one 16-bit MAC operation or sum of two 8-bit multiplications plus a 16-bit addend. To make the proposed MAC unit more versatile, the bit-width of exponent and mantissa can be flexibly exchanged. By setting the bit-width of exponent to zero, the proposed MAC unit also supports fixed-point operations. For fixed-point format, the proposed unit supports one 16-bit MAC or sum of two 8-bit multiplications plus a 16-bit addend. Moreover, the proposed unit can be further divided to support sum of four 4-bit multiplications plus a 16-bit addend. At the lowest precision, the proposed MAC unit supports accumulating of eight 1-bit logic AND operations to enable the support of binary neural networks. Compared to the standard 16-bit half-precision MAC unit, the proposed MAC unit provides more flexibility with only 21.8 percent area overhead. Compared to a standard 32-bit single-precision MAC unit, the proposed MAC unit requires much less hardware cost but still provides 8-bit exponent in the numerical format to maintain large dynamic range for deep learning computing.

中文翻译:

用于深度神经网络训练和推理的新型灵活多精度乘法累加单元

在本文中,提出了一种用于深度神经网络训练和推理的新型灵活多精度乘法累加 (MAC) 单元。建议的 MAC 单元支持定点运算和浮点运算。对于浮点格式,建议的单元支持一个 16 位 MAC 运算或两个 8 位乘法加一个 16 位加数的和。为了使所提出的 MAC 单元更加通用,可以灵活地交换指数和尾数的位宽。通过将指数的位宽设置​​为零,所提出的 MAC 单元还支持定点运算。对于定点格式,建议的单元支持一个 16 位 MAC 或两个 8 位乘法加一个 16 位加数的和。此外,所提出的单元可以进一步划分以支持四个 4 位乘法加一个 16 位加数的总和。在最低精度下,提议的 MAC 单元支持累积 8 个 1 位逻辑与运算,以支持二进制神经网络。与标准的 16 位半精度 MAC 单元相比,所提议的 MAC 单元提供了更大的灵活性,而区域开销仅为 21.8%。与标准的 32 位单精度 MAC 单元相比,所提出的 MAC 单元所需的硬件成本要低得多,但仍以数值格式提供 8 位指数,以保持深度学习计算的大动态范围。
更新日期:2020-01-01
down
wechat
bug