当前位置: X-MOL 学术IEEE Micro › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enhancing Model Parallelism in Neural Architecture Search for Multi-device System
IEEE Micro ( IF 3.6 ) Pub Date : 2020-09-01 , DOI: 10.1109/mm.2020.3004538
Cheng Fu 1 , Huili Chen 1 , Zhenheng Yang 2 , Farinaz Koushanfar 1 , Yuandong Tian 2 , Jishen Zhao 1
Affiliation  

Neural architecture search (NAS) finds favorable network topologies for better task performance. Existing hardware-aware NAS techniques only target to reduce inference latency on single CPU/GPU systems and the searched model can hardly be parallelized. To address this issue, we propose ColocNAS, the first synchronization-aware, end-to-end NAS framework that automates the design of parallelizable neural networks for multidevice systems while maintaining a high task accuracy. ColocNAS defines a new search space with elaborated connectivity to reduce device communication and synchronization. ColocNAS consists of three phases: 1) offline latency profiling that constructs a lookup table of inference latency of various networks for online runtime approximation; 2) differentiable latency-aware NAS that simultaneously minimizes inference latency and task error; and 3) reinforcement-learning-based device placement fine-tuning to further reduce the latency of the deployed model. Extensive evaluation corroborates ColocNAS's effectiveness to reduce inference latency while preserving task accuracy.

中文翻译:

在多设备系统的神经架构搜索中增强模型并行性

神经架构搜索 (NAS) 找到有利的网络拓扑,以实现更好的任务性能。现有的硬件感知 NAS 技术仅旨在减少单个 CPU/GPU 系统上的推理延迟,并且搜索模型很难并行化。为了解决这个问题,我们提出了 ColocNAS,这是第一个同步感知的端到端 NAS 框架,它可以自动设计多设备系统的可并行化神经网络,同时保持高任务准确性。ColocNAS 定义了一个具有精心连接的新搜索空间,以减少设备通信和同步。ColocNAS 由三个阶段组成:1)离线延迟分析,构建各种网络的推理延迟查找表,用于在线运行时近似;2) 可区分延迟感知 NAS,同时最大限度地减少推理延迟和任务错误;3)基于强化学习的设备放置微调,以进一步减少部署模型的延迟。广泛的评估证实了 ColocNAS 在减少推理延迟的同时保持任务准确性的有效性。
更新日期:2020-09-01
down
wechat
bug