当前位置: X-MOL 学术IEEE Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Proxy Experience Replay: Federated Distillation for Distributed Reinforcement Learning
IEEE Intelligent Systems ( IF 6.4 ) Pub Date : 2020-07-01 , DOI: 10.1109/mis.2020.2994942
Han Cha 1 , Jihong Park 2 , Hyesung Kim 3 , Mehdi Bennis 4 , Seong-Lyun Kim 1
Affiliation  

Traditional distributed deep reinforcement learning (RL) commonly relies on exchanging the experience replay memory (RM) of each agent. Since the RM contains all state observations and action policy history, it may incur huge communication overhead while violating the privacy of each agent. Alternatively, this article presents a communication-efficient and privacy-preserving distributed RL framework, coined federated reinforcement distillation (FRD). In FRD, each agent exchanges its proxy experience RM (ProxRM), in which policies are locally averaged with respect to proxy states clustering actual states. To provide FRD design insights, we present ablation studies on the impact of ProxRM structures, neural network architectures, and communication intervals. Furthermore, we propose an improved version of FRD, coined mixup augmented FRD (MixFRD), in which ProxRM is interpolated using the mixup data augmentation algorithm. Simulations in a Cartpole environment validate the effectiveness of MixFRD in reducing the variance of mission completion time and communication cost, compared to the benchmark schemes, vanilla FRD, federated RL (FRL), and policy distillation.

中文翻译:

代理经验回放:分布式强化学习的联合蒸馏

传统的分布式深度强化学习 (RL) 通常依赖于交换每个代理的经验回放记忆 (RM)。由于 RM 包含所有状态观察和动作策略历史,因此可能会在侵犯每个代理的隐私的同时产生巨大的通信开销。或者,本文提出了一种高效通信且保护隐私的分布式 RL 框架,即联合强化蒸馏 (FRD)。在 FRD 中,每个代理交换其代理经验 RM(ProxRM),其中策略相对于代理状态聚类实际状态进行局部平均。为了提供 FRD 设计见解,我们介绍了有关 ProxRM 结构、神经网络架构和通信间隔影响的消融研究。此外,我们提出了 FRD 的改进版本,创造了混合增强 FRD(MixFRD),其中使用混合数据增强算法对 ProxRM 进行插值。与基准方案、vanilla FRD、联合 RL (FRL) 和策略蒸馏相比,Cartpole 环境中的模拟验证了 MixFRD 在减少任务完成时间和通信成本差异方面的有效性。
更新日期:2020-07-01
down
wechat
bug