当前位置: X-MOL 学术IEEE J. Emerg. Sel. Top. Circuits Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
QS-NAS: Optimally Quantized Scaled Architecture Search to Enable Efficient On-Device Micro-AI
IEEE Journal on Emerging and Selected Topics in Circuits and Systems ( IF 4.6 ) Pub Date : 2021-11-15 , DOI: 10.1109/jetcas.2021.3127932
Morteza Hosseini , Tinoosh Mohsenin

Because of their simple hardware requirements, low bitwidth neural networks (NN) have gained significant attention over the recent years, and have been extensively employed in the state-of-the-art devices that seek efficiency and performance. Research has shown that scaled-up low bitwidth NNs can have accuracy levels on par with their full-precision counterparts. As a result, there is a trade-off between quantization ( $q$ ) and scaling ( $s$ ) of NNs to maintain the accuracy. To capture that trade-off, in this paper, we propose QS-NAS which is a systematic approach to explore the best quantization and scaling factors for a NN architecture that satisfies a targeted accuracy level and results in the least energy consumption per inference when deployed to a hardware–FPGA in this work. We first approximate the accuracy of a NN using a polynomial regression based on experiencing over a span of $q$ and $s$ . Then, we design a hardware that is scalable with $P$ processing engines (PE) and $M$ multipliers per PE, and infer that the configuration of the most energy-efficient hardware as well as its energy per inference for a NN $\langle q,\,s\rangle $ are, in turn, a function of $q$ and $s$ . Experiencing the NNs with various $q$ and $s$ over our hardware, we approximate the energy consumption using another polynomial regression. Given the two approximators, we obtain a pair of $q$ and $s$ that minimizes the energy for a given targeted accuracy. The method was evaluated on SVHN, CIFAR-10, and ImageNet datasets trained on VGG-like and MobileNet-192 architectures, and the optimized models were deployed to Xilinx FPGAs for fully on-chip processing. The implementation results outperform the related work in terms of energy-efficiency and/or power consumption, yet having similar or higher accuracy. The proposed optimization method is fast, simple, and scalable to emerging technologies. Moreover, it can be used on top of other AutoML frameworks to maximize efficiency of running artificial intelligence on edge devices.

中文翻译:

QS-NAS:优化量化的缩放架构搜索,以实现高效的设备微人工智能

由于其简单的硬件要求,低位宽神经网络 (NN) 近年来受到了极大的关注,并已广泛应用于追求效率和性能的最先进设备中。研究表明,按比例放大的低位宽 NN 可以具有与其全精度对应物相当的准确度水平。因此,量化 ( $q$ ) 和缩放 ( $s$ ) 的神经网络以保持准确性。为了权衡这种权衡,在本文中,我们提出了 QS-NAS,这是一种系统方法,用于探索 NN 架构的最佳量化和缩放因子,该架构满足目标准确度级别,并在部署时实现每次推理的最低能耗在这项工作中的硬件 - FPGA。我们首先使用多项式回归来近似 NN 的准确性 $q$ $s$ . 然后,我们设计了一个可扩展的硬件 $P$ 处理引擎 (PE) 和 百万美元 每个 PE 的乘数,并推断出最节能硬件的配置及其每次推理的能量 NN $\lange q,\,s\rangle $ 反过来,是一个函数 $q$ $s$ . 体验各种不同的神经网络 $q$ $s$ 在我们的硬件上,我们使用另一个多项式回归来估算能耗。给定两个逼近器,我们得到一对 $q$ $s$ 对于给定的目标精度,可以最大限度地减少能量。该方法在 SVHN、CIFAR-10 和 ImageNet 数据集上进行了评估,这些数据集在 VGG-like 和 MobileNet-192 架构上训练,优化后的模型部署到 Xilinx FPGA 以进行完全片上处理。实施结果在能效和/或功耗方面优于相关工作,但具有相似或更高的准确性。所提出的优化方法快速、简单且可扩展到新兴技术。此外,它可以在其他 AutoML 框架之上使用,以最大限度地提高在边缘设备上运行人工智能的效率。
更新日期:2021-12-14
down
wechat
bug