当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Rethinking Co-design of Neural Architectures and Hardware Accelerators
arXiv - CS - Hardware Architecture Pub Date : 2021-02-17 , DOI: arxiv-2102.08619
Yanqi Zhou, Xuanyi Dong, Berkin Akin, Mingxing Tan, Daiyi Peng, Tianjian Meng, Amir Yazdanbakhsh, Da Huang, Ravi Narayanaswami, James Laudon

Neural architectures and hardware accelerators have been two driving forces for the progress in deep learning. Previous works typically attempt to optimize hardware given a fixed model architecture or model architecture given fixed hardware. And the dominant hardware architecture explored in this prior work is FPGAs. In our work, we target the optimization of hardware and software configurations on an industry-standard edge accelerator. We systematically study the importance and strategies of co-designing neural architectures and hardware accelerators. We make three observations: 1) the software search space has to be customized to fully leverage the targeted hardware architecture, 2) the search for the model architecture and hardware architecture should be done jointly to achieve the best of both worlds, and 3) different use cases lead to very different search outcomes. Our experiments show that the joint search method consistently outperforms previous platform-aware neural architecture search, manually crafted models, and the state-of-the-art EfficientNet on all latency targets by around 1% on ImageNet top-1 accuracy. Our method can reduce energy consumption of an edge accelerator by up to 2x under the same accuracy constraint, when co-adapting the model architecture and hardware accelerator configurations.

中文翻译:

重新思考神经架构和硬件加速器的协同设计

神经架构和硬件加速器一直是深度学习进步的两个驱动力。先前的工作通常在给定固定模型体系结构或给定固定硬件模型体系结构的情况下尝试优化硬件。在此先前的工作中探索的主要硬件架构是FPGA。在我们的工作中,我们的目标是在行业标准的边缘加速器上优化硬件和软件配置。我们系统地研究了共同设计神经体系结构和硬件加速器的重要性和策略。我们提出三点意见:1)必须自定义软件搜索空间以充分利用目标硬件体系结构; 2)应该共同进行模型体系结构和硬件体系结构的搜索,以实现两全其美;和3)不同的用例导致非常不同的搜索结果。我们的实验表明,联合搜索方法在所有延迟目标上始终优于以前的平台感知型神经体系结构搜索,手动制作的模型和最新的EfficientNet,其精度比ImageNet top-1准确度高出1%。当共同适应模型架构和硬件加速器配置时,我们的方法可以在相同的精度约束下将边缘加速器的能耗降低多达2倍。
更新日期:2021-02-18
down
wechat
bug