当前位置: X-MOL 学术J. Optim. Theory Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reachability and Safety Objectives in Markov Decision Processes on Long but Finite Horizons
Journal of Optimization Theory and Applications ( IF 1.9 ) Pub Date : 2020-05-18 , DOI: 10.1007/s10957-020-01681-2
Galit Ashkenazi-Golan , János Flesch , Arkadi Predtetchinski , Eilon Solan

We consider discrete-time Markov decision processes in which the decision maker is interested in long but finite horizons. First we consider reachability objective: the decision maker’s goal is to reach a specific target state with the highest possible probability. A strategy is said to overtake another strategy, if it gives a strictly higher probability of reaching the target state on all sufficiently large but finite horizons. We prove that there exists a pure stationary strategy that is not overtaken by any pure strategy nor by any stationary strategy, under some condition on the transition structure and respectively under genericity. A strategy that is not overtaken by any other strategy, called an overtaking optimal strategy, does not always exist. We provide sufficient conditions for its existence. Next we consider safety objective: the decision maker’s goal is to avoid a specific state with the highest possible probability. We argue that the results proven for reachability objective extend to this model.

中文翻译:

长期但有限视野上马尔可夫决策过程中的可达性和安全性目标

我们考虑离散时间马尔可夫决策过程,其中决策者对长期但有限的范围感兴趣。首先我们考虑可达性目标:决策者的目标是以尽可能高的概率到达特定的目标状态。如果一个策略在所有足够大但有限的范围内给出达到目标状态的严格更高的概率,则称该策略超越了另一个策略。我们证明,在过渡结构的某种条件下和分别在通用性下,存在一个既不被任何纯策略取代也不被任何平稳策略超越的纯平稳策略。不被任何其他策略超越的策略,称为超车最优策略,并不总是存在。我们为其存在提供了充分条件。接下来我们考虑安全目标:决策者的目标是以尽可能高的概率避免特定状态。我们认为,可达性目标证明的结果扩展到这个模型。
更新日期:2020-05-18
down
wechat
bug