Faded-Experience Trust Region Policy Optimization for Model-Free Power Allocation in Interference Channel,IEEE Wireless Communications Letters

当前位置： X-MOL 学术 › IEEE Wirel. Commun. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Faded-Experience Trust Region Policy Optimization for Model-Free Power Allocation in Interference Channel
IEEE Wireless Communications Letters ( IF 4.6 ) Pub Date : 2020-01-01 , DOI: 10.1109/lwc.2020.3045005
Mohammad G. Khoshkholgh , Halim Yanikomeroglu

Policy gradient reinforcement learning techniques enable an agent to directly learn an optimal action policy through the interactions with the environment. Nevertheless, despite its advantages, it sometimes suffers from slow convergence speed. Inspired by human decision making approach, we work toward enhancing its convergence speed by augmenting the agent to memorize and use the recently learned policies. We apply our method to the trust-region policy optimization (TRPO), primarily developed for locomotion tasks, and propose faded-experience (FE) TRPO. To substantiate its effectiveness, we adopt it to learn continuous power control in an interference channel when only noisy location information of devices is available. Results indicate that with FE-TRPO it is possible to almost double the learning speed compared to TRPO. Importantly, our method neither increases the learning complexity nor imposes performance loss.

中文翻译：

干扰信道无模型功率分配的衰落经验信任域策略优化

策略梯度强化学习技术使代理能够通过与环境的交互直接学习最佳动作策略。尽管如此，尽管它具有优势，但有时会遇到收敛速度缓慢的问题。受人类决策方法的启发，我们努力通过增强代理来记忆和使用最近学习的策略来提高其收敛速度。我们将我们的方法应用于主要为运动任务开发的信任区域策略优化 (TRPO)，并提出了衰落体验 (FE) TRPO。为了证实其有效性，当只有设备的嘈杂位置信息可用时，我们采用它来学习干扰信道中的连续功率控制。结果表明，与 TRPO 相比，使用 FE-TRPO 几乎可以将学习速度提高一倍。重要的，

更新日期：2020-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11