当前位置: X-MOL 学术Proc. IEEE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient AI System Design With Cross-Layer Approximate Computing
Proceedings of the IEEE ( IF 20.6 ) Pub Date : 2020-12-01 , DOI: 10.1109/jproc.2020.3029453
Swagath Venkataramani , Xiao Sun , Naigang Wang , Chia-Yu Chen , Jungwook Choi , Mingu Kang , Ankur Agarwal , Jinwook Oh , Shubham Jain , Tina Babinsky , Nianzheng Cao , Thomas Fox , Bruce Fleischer , George Gristede , Michael Guillorn , Howard Haynie , Hiroshi Inoue , Kazuaki Ishizaki , Michael Klaiber , Shih-Hsien Lo , Gary Maier , Silvia Mueller , Michael Scheuermann , Eri Ogawa , Marcel Schaal , Mauricio Serrano , Joel Silberman , Christos Vezyrtzis , Wei Wang , Fanchieh Yee , Jintao Zhang , Matthew Ziegler , Ching Zhou , Moriyoshi Ohara , Pong-Fei Lu , Brian Curran , Sunil Shukla , Vijayalakshmi Srinivasan , Leland Chang , Kailash Gopalakrishnan

Advances in deep neural networks (DNNs) and the availability of massive real-world data have enabled superhuman levels of accuracy on many AI tasks and ushered the explosive growth of AI workloads across the spectrum of computing devices. However, their superior accuracy comes at a high computational cost, which necessitates approaches beyond traditional computing paradigms to improve their operational efficiency. Leveraging the application-level insight of error resilience, we demonstrate how approximate computing (AxC) can significantly boost the efficiency of AI platforms and play a pivotal role in the broader adoption of AI-based applications and services. To this end, we present RaPiD, a multi-tera operations per second (TOPS) AI hardware accelerator core (fabricated at 14-nm technology) that we built from the ground-up using AxC techniques across the stack including algorithms, architecture, programmability, and hardware. We highlight the workload-guided systematic explorations of AxC techniques for AI, including custom number representations, quantization/pruning methodologies, mixed-precision architecture design, instruction sets, and compiler technologies with quality programmability, employed in the RaPiD accelerator.

中文翻译:

跨层近似计算的高效人工智能系统设计

深度神经网络 (DNN) 的进步和大量真实世界数据的可用性使许多 AI 任务的准确度达到了超人水平,并推动了计算设备范围内 AI 工作负载的爆炸性增长。然而,它们卓越的准确性伴随着高昂的计算成本,这需要超越传统计算范式的方法来提高它们的运行效率。利用错误恢复的应用程序级洞察力,我们展示了近似计算 (AxC) 如何显着提高人工智能平台的效率,并在更广泛地采用基于人工智能的应用程序和服务中发挥关键作用。为此,我们提出 RaPiD,我们使用 AxC 技术从头开始构建的每秒多兆次操作​​ (TOPS) AI 硬件加速器内核(采用 14 纳米技术制造),包括算法、架构、可编程性和硬件。我们重点介绍了用于 AI 的 AxC 技术的工作负载引导系统探索,包括自定义数字表示、量化/修剪方法、混合精度架构设计、指令集和具有高质量可编程性的编译器技术,这些技术在 RaPiD 加速器中使用。
更新日期:2020-12-01
down
wechat
bug