当前位置: X-MOL 学术Eur. J. Oper. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Rank-1 transition uncertainties in constrained Markov decision processes
European Journal of Operational Research ( IF 6.4 ) Pub Date : 2024-04-24 , DOI: 10.1016/j.ejor.2024.04.023
V Varagapriya , Vikas Vikram Singh , Abdel Lisser

We consider an infinite-horizon discounted constrained Markov decision process (CMDP) with uncertain transition probabilities. We assume that the uncertainty in transition probabilities has a rank-1 matrix structure and the underlying uncertain parameters belong to a polytope. We formulate the uncertain CMDP problem using a robust optimization framework. To derive reformulation of the robust CMDP problem, we restrict to the class of stationary policies and show that it is equivalent to a bilinear programming problem. We provide a simple example where a Markov policy performs better than the optimal policy in the class of stationary policies, implying that, unlike in classical CMDP problem, an optimal policy of the robust CMDP problem need not be present in the class of stationary policies. For the case of a single uncertain parameter, we propose sufficient conditions under which an optimal policy of the restricted robust CMDP problem is unaffected by uncertainty. The numerical experiments are performed on randomly generated instances of a machine replacement problem and a well-known class of problems called Garnets.

中文翻译:

受限马尔可夫决策过程中的 Rank-1 转换不确定性

我们考虑具有不确定转移概率的无限范围贴现约束马尔可夫决策过程(CMDP)。我们假设转移概率的不确定性具有 1 阶矩阵结构,并且潜在的不确定性参数属于多面体。我们使用稳健的优化框架来制定不确定的 CMDP 问题。为了重新表述稳健的 CMDP 问题,我们限制于固定策略类别,并证明它相当于双线性规划问题。我们提供了一个简单的例子,其中马尔可夫策略比静态策略类中的最优策略表现得更好,这意味着与经典 CMDP 问题不同,鲁棒 CMDP 问题的最优策略不需要出现在静态策略类中。对于单个不确定参数的情况,我们提出了限制鲁棒 CMDP 问题的最优策略不受不确定性影响的充分条件。数值实验是在机器替换问题和一类众所周知的石榴石问题的随机生成实例上进行的。
更新日期:2024-04-24
down
wechat
bug