当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
E-BATCH: Energy-Efficient and High-Throughput RNN Batching
arXiv - CS - Hardware Architecture Pub Date : 2020-09-22 , DOI: arxiv-2009.10656
Franyell Silfa, Jose Maria Arnau, and Antonio Gonzalez

Recurrent Neural Network (RNN) inference exhibits low hardware utilization due to the strict data dependencies across time-steps. Batching multiple requests can increase throughput. However, RNN batching requires a large amount of padding since the batched input sequences may largely differ in length. Schemes that dynamically update the batch every few time-steps avoid padding. However, they require executing different RNN layers in a short timespan, decreasing energy efficiency. Hence, we propose E-BATCH, a low-latency and energy-efficient batching scheme tailored to RNN accelerators. It consists of a runtime system and effective hardware support. The runtime concatenates multiple sequences to create large batches, resulting in substantial energy savings. Furthermore, the accelerator notifies it when the evaluation of a sequence is done, so that a new sequence can be immediately added to a batch, thus largely reducing the amount of padding. E-BATCH dynamically controls the number of time-steps evaluated per batch to achieve the best trade-off between latency and energy efficiency for the given hardware platform. We evaluate E-BATCH on top of E-PUR and TPU. In E-PUR, E-BATCH improves throughput by 1.8x and energy-efficiency by 3.6x, whereas in TPU, it improves throughput by 2.1x and energy-efficiency by 1.6x, over the state-of-the-art.

中文翻译:

E-BATCH:节能和高吞吐量的 RNN 批处理

由于跨时间步长的严格数据依赖性,循环神经网络 (RNN) 推理表现出低硬件利用率。批处理多个请求可以提高吞吐量。然而,RNN 批处理需要大量填充,因为批处理输入序列的长度可能会有很大差异。每隔几个时间步动态更新批处理的方案避免了填充。然而,它们需要在短时间内执行不同的 RNN 层,从而降低了能源效率。因此,我们提出了 E-BATCH,这是一种为 RNN 加速器量身定制的低延迟和节能的批处理方案。它由运行时系统和有效的硬件支持组成。运行时连接多个序列以创建大批量,从而节省大量能源。此外,当序列的评估完成时,加速器会通知它,这样可以立即将新序列添加到批次中,从而大大减少填充量。E-BATCH 动态控制每批评估的时间步数,以在给定硬件平台的延迟和能源效率之间实现最佳平衡。我们在 E-PUR 和 TPU 之上评估 E-BATCH。在 E-PUR 中,E-BATCH 将吞吐量提高了 1.8 倍,能效提高了 3.6 倍,而在 TPU 中,它比最先进的技术提高了 2.1 倍的吞吐量和 1.6 倍的能效。
更新日期:2020-09-23
down
wechat
bug