当前位置: X-MOL 学术Proc. IEEE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights
Proceedings of the IEEE ( IF 23.2 ) Pub Date : 2021-08-05 , DOI: 10.1109/jproc.2021.3098483
Shail Dave , Riyadh Baghdadi , Tony Nowatzki , Sasikanth Avancha , Aviral Shrivastava , Baoxin Li

Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these overparameterized models are compressed by leveraging sparsity, size reduction, and quantization of tensors. Unstructured sparsity and tensors with varying dimensions yield irregular computation, communication, and memory access patterns; processing them on hardware accelerators in a conventional manner does not inherently leverage acceleration opportunities. This article provides a comprehensive survey on the efficient execution of sparse and irregular tensor computations of ML models on hardware accelerators. In particular, it discusses enhancement modules in the architecture design and the software support, categorizes different hardware designs and acceleration techniques, analyzes them in terms of hardware and execution costs, analyzes achievable accelerations for recent DNNs, and highlights further opportunities in terms of hardware/software/model codesign optimizations (inter/intramodule). The takeaways from this article include the following: understanding the key challenges in accelerating sparse, irregular shaped, and quantized tensors; understanding enhancements in accelerator systems for supporting their efficient computations; analyzing tradeoffs in opting for a specific design choice for encoding, storing, extracting, communicating, computing, and load-balancing the nonzeros; understanding how structured sparsity can improve storage efficiency and balance computations; understanding how to compile and map models with sparse tensors on the accelerators; and understanding recent design trends for efficient accelerations and further opportunities.

中文翻译:


ML 模型的稀疏和不规则张量计算的硬件加速:调查和见解



机器学习(ML)模型广泛应用于许多重要领域。为了有效地处理这些计算和内存密集型应用程序,通过利用张量的稀疏性、尺寸减小和量化来压缩这些超参数化模型的张量。非结构化稀疏性和不同维度的张量会产生不规则的计算、通信和内存访问模式;以传统方式在硬件加速器上处理它们本身并不能利用加速机会。本文对硬件加速器上 ML 模型的稀疏和不规则张量计算的高效执行进行了全面的调查。特别是,它讨论了架构设计和软件支持中的增强模块,对不同的硬件设计和加速技术进行了分类,从硬件和执行成本方面对其进行了分析,分析了最新 DNN 可实现的加速,并强调了硬件/加速方面的进一步机会。软件/模型协同设计优化(模块间/模块内)。本文的要点包括:了解加速稀疏、不规则形状和量化张量的关键挑战;了解加速器系统的增强功能以​​支持其高效计算;分析对非零值进行编码、存储、提取、通信、计算和负载平衡的特定设计选择的权衡;了解结构化稀疏性如何提高存储效率和平衡计算;了解如何在加速器上编译和映射具有稀疏张量的模型;并了解最新的设计趋势,以实现有效的加速和更多的机会。
更新日期:2021-08-05
down
wechat
bug