Detection-averse optimal and receding-horizon control for Markov decision processes,Automatica

当前位置： X-MOL 学术 › Automatica › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Detection-averse optimal and receding-horizon control for Markov decision processes
Automatica ( IF 6.4 ) Pub Date : 2020-09-28 , DOI: 10.1016/j.automatica.2020.109278
Nan Li , Ilya Kolmanovsky , Anouck Girard

In this paper, we consider a Markov decision process (MDP) in which the ego agent intends to hide its state from detection by an adversary while pursuing a nominal objective. After formulating the detection-averse MDP problem, we first describe a value iteration (VI) approach to exactly solve it. To overcome the “curse of dimensionality” and thus gain scalability to larger-sized problems, we then propose a receding-horizon optimization (RHO) approach to compute approximate solutions. Numerical examples are reported to illustrate and compare the VI and RHO approaches, and show the potential of the proposed problem formulation for practical applications.

中文翻译：

马尔可夫决策过程的避免检测的最优和后退水平控制

在本文中，我们考虑了一个马尔可夫决策过程（MDP），在这个过程中，自我主体打算在追求名义目标的同时对自己的状态隐藏起来，以防对手察觉。提出了避免检测的MDP问题之后，我们首先描述一种值迭代（VI）方法来精确解决该问题。为了克服“维数的诅咒”并因此获得对较大问题的可伸缩性，我们然后提出了一种后向水平优化（RHO）方法来计算近似解。报告了数值示例，以说明和比较VI和RHO方法，并显示了提出的问题公式在实际应用中的潜力。

更新日期：2020-09-28

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>