Multiobjective Reinforcement Learning-Based Neural Architecture Search for Efficient Portrait Parsing,IEEE Transactions on Cybernetics

当前位置： X-MOL 学术 › IEEE Trans. Cybern. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multiobjective Reinforcement Learning-Based Neural Architecture Search for Efficient Portrait Parsing
IEEE Transactions on Cybernetics ( IF 9.4 ) Pub Date : 2021-08-30 , DOI: 10.1109/tcyb.2021.3104866
Bo Lyu ₁ , Shiping Wen ₂ , Kaibo Shi ₃ , Tingwen Huang ₄

Affiliation

This article dedicates to automatically explore efficient portrait parsing models that are easily deployed in edge computing or terminal devices. In the interest of the tradeoff between the resource cost and performance, we design the multiobjective reinforcement learning (RL)-based neural architecture search (NAS) scheme, which comprehensively balances the accuracy, parameters, FLOPs, and inference latency. Finally, under varying hyperparameter configurations, the search procedure emits a bunch of excellent objective-oriented architectures. The combination of two-stage training with precomputing and memory-resident feature maps effectively reduces the time consumption of the RL-based NAS method, so that we complete approximately 1000 search iterations in two GPU days. To accelerate the convergence of the lightweight candidate architecture, we incorporate knowledge distillation into the training of the search process. This also provides a reasonable evaluation signal to the RL controller that enables it to converge well. In the end, we conduct full training with outstanding Pareto-optimal architectures, so that a series of excellent portrait parsing models (with only approximately 0.3M parameters) is received. Furthermore, we directly transfer the architectures searched on CelebAMask-HQ (Portrait Parsing) to other portrait and face segmentation tasks. Finally, we achieve the state-of-the-art performance of 96.5% MIOU on EG1800 (portrait segmentation) and 91.6% overall F1F1 -score on HELEN (face labeling). That is, our models significantly surpass the artificial network on the accuracy, but with lower resource consumption and higher real-time performance.

中文翻译：

基于多目标强化学习的神经架构搜索，实现高效肖像解析

本文致力于自动探索易于部署在边缘计算或终端设备中的高效肖像解析模型。考虑到资源成本和性能之间的权衡，我们设计了基于多目标强化学习（RL）的神经架构搜索（NAS）方案，该方案综合平衡了精度、参数、FLOPs和推理延迟。最后，在不同的超参数配置下，搜索过程产生了一堆优秀的面向目标的架构。两阶段训练与预计算和内存驻留特征图的结合有效减少了基于 RL 的 NAS 方法的时间消耗，使我们在两个 GPU 天内完成了大约 1000 次搜索迭代。为了加速轻量级候选架构的收敛，我们将知识蒸馏纳入搜索过程的训练中。这也为 RL 控制器提供了合理的评估信号，使其能够很好地收敛。最终，我们利用优秀的帕累托最优架构进行充分训练，得到了一系列优秀的肖像解析模型（只有大约0.3M参数）。此外，我们直接将 CelebAMask-HQ（肖像解析）上搜索的架构转移到其他肖像和人脸分割任务中。最后，我们在 EG1800（肖像分割）上实现了 96.5% MIOU 的最先进性能，在 HELEN（面部标记）上实现了 91.6% 的总体 F1F1 分数。也就是说，我们的模型在准确性上显着超越人工网络，但具有更低的资源消耗和更高的实时性能。

更新日期：2021-08-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11