Batch mode reinforcement learning based on the synthesis of artificial trajectories,Annals of Operations Research

当前位置： X-MOL 学术 › Ann. Oper. Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Batch mode reinforcement learning based on the synthesis of artificial trajectories
Annals of Operations Research ( IF 4.4 ) Pub Date : 2012-11-15 , DOI: 10.1007/s10479-012-1248-5
Raphael Fonteneau ₁ , Susan A Murphy , Louis Wehenkel , Damien Ernst

Affiliation

In this paper, we consider the batch mode reinforcement learning setting, where the central problem is to learn from a sample of trajectories a policy that satisfies or optimizes a performance criterion. We focus on the continuous state space case for which usual resolution schemes rely on function approximators either to represent the underlying control problem or to represent its value function. As an alternative to the use of function approximators, we rely on the synthesis of “artificial trajectories” from the given sample of trajectories, and show that this idea opens new avenues for designing and analyzing algorithms for batch mode reinforcement learning.

中文翻译：

基于人工轨迹合成的批处理模式强化学习

在本文中，我们考虑批处理模式强化学习设置，其中的中心问题是从轨迹样本中学习满足或优化性能标准的策略。我们专注于连续状态空间的情况，通常的解决方案依赖于函数逼近器来表示潜在的控制问题或表示其价值函数。作为使用函数逼近器的替代方法，我们依赖于从给定轨迹样本中合成“人工轨迹”，并表明这一想法为设计和分析批处理模式强化学习算法开辟了新途径。

更新日期：2012-11-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11