当前位置: X-MOL 学术IEEE Comput. Intell. Mag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
QoE-Driven Content-Centric Caching With Deep Reinforcement Learning in Edge-Enabled IoT
IEEE Computational Intelligence Magazine ( IF 10.3 ) Pub Date : 2019-11-01 , DOI: 10.1109/mci.2019.2937608
Xiaoming He 1 , Kun Wang 2 , Wenyao Xu 3
Affiliation  

When humans learn several skills to solve multiple tasks, they exhibit an extraordinary capacity to transfer knowledge between them. The authors present here the last enhanced version of a bioinspired reinforcement-learning (RL) modular architecture able to perform skill-to-skill knowledge transfer and called transfer expert RL (TERL) model. TERL architecture is based on a RL actor-critic model where both actor and critic have a hierarchical structure, inspired by the mixture-of-experts model, formed by a gating network that selects experts specializing in learning the policies or value functions of different tasks. A key feature of TERL is the capacity of its gating networks to accumulate, in parallel, evidence on the capacity of experts to solve the new tasks so as to increase the responsibility for action of the best ones. A second key feature is the use of two different responsibility signals for the experts' functioning and learning: this allows the training of multiple experts for each task so that some of them can be later recruited to solve new tasks and avoid catastrophic interference. The utility of TERL mechanisms is shown with tests involving two simulated dynamic robot arms engaged in solving reaching tasks, in particular a planar 2-DoF arm, and a 3-D 4-DoF arm.

中文翻译:

QoE 驱动的以内容为中心的缓存与边缘启用物联网中的深度强化学习

当人类学习几种技能来解决多项任务时,他们就会表现出非凡的在他们之间转移知识的能力。作者在此展示了生物启发式强化学习 (RL) 模块化架构的最新增强版本,该架构能够执行技能到技能的知识转移,称为转移专家 RL (TERL) 模型。TERL 架构基于 RL 演员-评论家模型,其中演员和评论家都具有层次结构,受到专家混合模型的启发,该模型由一个门控网络形成,该网络选择专门学习不同任务的策略或价值函数的专家. TERL 的一个关键特征是其门控网络能够同时积累有关专家解决新任务的能力的证据,从而增加最佳专家的行动责任。第二个关键特征是对专家的运作和学习使用两种不同的责任信号:这允许针对每项任务训练多名专家,以便以后可以招募其中一些专家来解决新任务并避免灾难性干扰。TERL 机制的效用通过涉及两个模拟动态机器人手臂参与解决到达任务的测试来展示,特别是平面 2-DoF 手臂和 3-D 4-DoF 手臂。
更新日期:2019-11-01
down
wechat
bug