A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments,ACM Computing Surveys

当前位置： X-MOL 学术 › ACM Comput. Surv. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments
ACM Computing Surveys ( IF 16.6 ) Pub Date : 2021-07-13 , DOI: 10.1145/3459991
Sindhu Padakandla ₁

Affiliation

Reinforcement learning (RL) algorithms find applications in inventory control, recommender systems, vehicular traffic management, cloud computing, and robotics. The real-world complications arising in these domains makes them difficult to solve with the basic assumptions underlying classical RL algorithms. RL agents in these applications often need to react and adapt to changing operating conditions. A significant part of research on single-agent RL techniques focuses on developing algorithms when the underlying assumption of stationary environment model is relaxed. This article provides a survey of RL methods developed for handling dynamically varying environment models. The goal of methods not limited by the stationarity assumption is to help autonomous agents adapt to varying operating conditions. This is possible either by minimizing the rewards lost during learning by RL agent or by finding a suitable policy for the RL agent that leads to efficient operation of the underlying system. A representative collection of these algorithms is discussed in detail in this work along with their categorization and their relative merits and demerits. Additionally, we also review works that are tailored to application domains. Finally, we discuss future enhancements for this field.

中文翻译：

动态变化环境的强化学习算法综述

强化学习 (RL) 算法在库存控制、推荐系统、车辆交通管理、云计算和机器人技术中都有应用。这些领域中出现的现实世界的复杂性使得它们难以用经典 RL 算法的基本假设来解决。这些应用程序中的 RL 代理通常需要做出反应并适应不断变化的操作条件。单智能体 RL 技术研究的一个重要部分集中在放松静止环境模型的基本假设时开发算法。本文概述了为处理动态变化的环境模型而开发的 RL 方法。不受平稳性假设限制的方法的目标是帮助自主代理适应不同的操作条件。这可以通过最小化 RL 代理在学习过程中损失的奖励或通过为 RL 代理找到合适的策略来实现底层系统的有效运行来实现。在这项工作中详细讨论了这些算法的代表性集合以及它们的分类及其相对优缺点。此外，我们还审查了针对应用领域量身定制的作品。最后，我们讨论了该领域的未来增强功能。

更新日期：2021-07-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>