On tight bounds for function approximation error in risk-sensitive reinforcement learning,Systems & Control Letters

当前位置： X-MOL 学术 › Syst. Control Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On tight bounds for function approximation error in risk-sensitive reinforcement learning
Systems & Control Letters ( IF 2.1 ) Pub Date : 2021-03-25 , DOI: 10.1016/j.sysconle.2021.104899
Prasenjit Karmakar , Shalabh Bhatnagar

In this letter we provide several informative tight error bounds when using value function approximators for the risk-sensitive cost setting for a given policy represented using exponential utility. The novelty of our approach is that we make use of the irreducibility of the underlying Markov chain (resulting in better bounds using Perron–Frobenius eigenvectors) to derive new bounds whereas the earlier work used primarily the spectral variation bound which holds for any matrix, hence did not make use of the irreducibility. All our bounds have a perturbation term for large state spaces. We also present examples where we show that the new bounds perform 90-100% better than the earlier proposed spectral variation bound.

中文翻译：

风险敏感型强化学习中函数逼近误差的严格界限

在这封信中，当将价值函数逼近器用于使用指数效用表示的给定策略的风险敏感成本设置时，我们提供了一些有用的严格误差范围。我们方法的新颖性在于，我们利用了潜在的马尔可夫链的不可约性（使用Perron-Frobenius特征向量产生更好的界）来推导新的界，而早期的工作主要使用了适用于任何矩阵的频谱变化界，因此没有利用不可约性。我们所有的边界都有一个针对大状态空间的扰动项。我们还提供了一些示例，这些示例表明新边界的性能比早期提出的频谱变化边界好90-100％。

更新日期：2021-03-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11