Prospect-theoretic Q-learning,Systems & Control Letters

当前位置： X-MOL 学术 › Syst. Control Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Prospect-theoretic Q-learning
Systems & Control Letters ( IF 2.1 ) Pub Date : 2021-08-14 , DOI: 10.1016/j.sysconle.2021.105009
Vivek S. Borkar ₁ , Siddharth Chandak ₁

Affiliation

We consider a prospect theoretic version of the classical Q-learning algorithm for discounted reward Markov decision processes, wherein the controller perceives a distorted and noisy future reward, modeled by a nonlinearity that accentuates gains and under-represents losses relative to a reference point. We analyze the asymptotic behavior of the scheme by analyzing its limiting differential equation and using the theory of monotone dynamical systems to infer its asymptotic behavior. Specifically, we show convergence to equilibria, and establish some qualitative facts about the equilibria themselves.

中文翻译：

前景理论 Q 学习

我们考虑了用于折扣奖励马尔可夫决策过程的经典 Q 学习算法的前景理论版本，其中控制器感知扭曲和嘈杂的未来奖励，由非线性建模，该非线性增强了相对于参考点的收益并低估了损失。我们通过分析其极限微分方程并利用单调动力系统理论推断其渐近行为来分析该方案的渐近行为。具体来说，我们展示了对均衡的收敛，并建立了一些关于均衡本身的定性事实。

更新日期：2021-08-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11