当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Voltage Scaling for Partitioned Systolic Array in A Reconfigurable Platform
arXiv - CS - Hardware Architecture Pub Date : 2021-02-13 , DOI: arxiv-2102.06888
Rourab Paul, Sreetama Sarkar, Suman Sau, Koushik Chakraborty, Sanghamitra Roy, Amlan Chakrabarti

The exponential emergence of Field Programmable Gate Array (FPGA) has accelerated the research of hardware implementation of Deep Neural Network (DNN). Among all DNN processors, domain specific architectures, such as, Google's Tensor Processor Unit (TPU) have outperformed conventional GPUs. However, implementation of TPUs in reconfigurable hardware should emphasize energy savings to serve the green computing requirement. Voltage scaling, a popular approach towards energy savings, can be a bit critical in FPGA as it may cause timing failure if not done in an appropriate way. In this work, we present an ultra low power FPGA implementation of a TPU for edge applications. We divide the systolic-array of a TPU into different FPGA partitions, where each partition uses different near threshold (NTC) biasing voltages to run its FPGA cores. The biasing voltage for each partition is roughly calculated by the proposed offline schemes. However, further calibration of biasing voltage is done by the proposed online scheme. Four clustering algorithms based on the slack value of different design paths study the partitioning of FPGA. To overcome the timing failure caused by NTC, the higher slack paths are placed in lower voltage partitions and lower slack paths are placed in higher voltage partitions. The proposed architecture is simulated in Artix-7 FPGA using the Vivado design suite and Python tool. The simulation results substantiate the implementation of voltage scaled TPU in FPGAs and also justifies its power efficiency.

中文翻译:

可重配置平台中分区收缩压阵列的电压缩放

现场可编程门阵列(FPGA)的指数级出现加速了深度神经网络(DNN)硬件实现的研究。在所有DNN处理器中,特定领域的体系结构(例如Google的Tensor处理器单元(TPU))已经超越了传统GPU。但是,在可重配置硬件中实施TPU应该强调节能以符合绿色计算要求。电压缩放是一种节省能源的流行方法,在FPGA中可能会很关键,因为如果不采取适当的方法,可能会导致定时故障。在这项工作中,我们为边缘应用展示了TPU的超低功耗FPGA实现。我们将TPU的脉动阵列划分为不同的FPGA分区,其中每个分区使用不同的近阈值(NTC)偏置电压来运行其FPGA内核。通过建议的脱机方案可以大致计算每个分区的偏置电压。但是,偏置电压的进一步校准是通过提出的在线方案完成的。基于不同设计路径的松弛值的四种聚类算法研究了FPGA的分区。为了克服由NTC引起的时序故障,较高的松弛路径放置在较低的电压分区中,而较低的松弛路径放置在较高的电压分区中。使用Vivado设计套件和Python工具在Artix-7 FPGA中模拟了所建议的体系结构。仿真结果证实了FPGA中电压缩放TPU的实现,并证明了其电源效率。基于不同设计路径的松弛值的四种聚类算法研究了FPGA的分区。为了克服由NTC引起的时序故障,较高的松弛路径放置在较低的电压分区中,而较低的松弛路径放置在较高的电压分区中。使用Vivado设计套件和Python工具在Artix-7 FPGA中模拟了所建议的体系结构。仿真结果证实了FPGA中电压缩放TPU的实现,并证明了其电源效率。基于不同设计路径的松弛值的四种聚类算法研究了FPGA的分区。为了克服由NTC引起的时序故障,较高的松弛路径放置在较低的电压分区中,而较低的松弛路径放置在较高的电压分区中。使用Vivado设计套件和Python工具在Artix-7 FPGA中模拟了所建议的体系结构。仿真结果证实了FPGA中电压缩放TPU的实现,并证明了其电源效率。使用Vivado设计套件和Python工具在Artix-7 FPGA中模拟了所建议的体系结构。仿真结果证实了FPGA中电压缩放TPU的实现,并证明了其电源效率。使用Vivado设计套件和Python工具在Artix-7 FPGA中模拟了所建议的体系结构。仿真结果证实了FPGA中电压缩放TPU的实现,并证明了其电源效率。
更新日期:2021-02-16
down
wechat
bug