Optimization Approach to Accelerator Codesign,IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

当前位置： X-MOL 学术 › IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Optimization Approach to Accelerator Codesign
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ( IF 2.7 ) Pub Date : 2020-06-01 , DOI: 10.1109/tcad.2019.2926489
Nirmal Prajapati , Sanjay Rajopadhye , Hristo Djidjev , Nandakishore Santhi , Tobias Grosser , Rumen Andonov

We propose an optimization approach for determining both hardware and software parameters for the efficient implementation of a (family of) applications called dense stencil computations on programmable general purpose computing on graphics processing units. We first introduce a simple, analytical model for the silicon area usage of accelerator architectures and a workload characterization of stencil computations. We combine this characterization with a parametric execution-time model and formulate a mathematical optimization problem that seeks to maximize a common objective function of all the hardware and software parameters. The solution to this problem, therefore, “solves” the codesign problem: simultaneously choosing software–hardware parameters to optimize total performance. We validate this approach by proposing architectural variants of the NVIDIA Maxwell GTX-980 (respectively, Titan X) specifically tuned to a predetermined workload of four common 2-D stencils (Heat, Jacobi, Laplacian, and Gradient) and two 3-D ones (Heat and Laplacian). Our model predicts that performance would potentially improve by 28% (respectively, 33%) with simple tweaks to the hardware parameters, such as tuning the number of streaming multiprocessors, the number of compute cores each contains, and the size of shared memory. We also develop a number of insights about the optimal regions of the design landscape.

中文翻译：

加速器协同设计的优化方法

我们提出了一种优化方法，用于确定硬件和软件参数，以便在图形处理单元上的可编程通用计算上有效实现称为密集模板计算的（系列）应用程序。我们首先介绍了一个简单的分析模型，用于加速器架构的硅面积使用和模板计算的工作负载特征。我们将这种表征与参数执行时间模型相结合，并制定了一个数学优化问题，旨在最大化所有硬件和软件参数的共同目标函数。因此，该问题的解决方案“解决”了协同设计问题：同时选择软件-硬件参数以优化总体性能。我们通过提出 NVIDIA Maxwell GTX-980（分别为 Titan X）的架构变体来验证这种方法，该变体专门针对四个常见 2-D 模板（Heat、Jacobi、Laplacian 和 Gradient）和两个 3-D 模板的预定工作负载进行了调整（热和拉普拉斯算子）。我们的模型预测，通过对硬件参数进行简单的调整，例如调整流式多处理器的数量、每个包含的计算内核的数量以及共享内存的大小，性能可能会提高 28%（分别为 33%）。我们还开发了许多关于设计景观最佳区域的见解。我们的模型预测，通过对硬件参数进行简单的调整，例如调整流式多处理器的数量、每个包含的计算内核的数量以及共享内存的大小，性能可能会提高 28%（分别为 33%）。我们还开发了许多关于设计景观最佳区域的见解。我们的模型预测，通过对硬件参数进行简单的调整，例如调整流式多处理器的数量、每个包含的计算内核的数量以及共享内存的大小，性能可能会提高 28%（分别为 33%）。我们还开发了许多关于设计景观最佳区域的见解。

更新日期：2020-06-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11