Poisoning Finite-Horizon Markov Decision Processes at Design Time,Computers & Operations Research

当前位置： X-MOL 学术 › Comput. Oper. Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Poisoning Finite-Horizon Markov Decision Processes at Design Time
Computers & Operations Research ( IF 4.1 ) Pub Date : 2021-05-01 , DOI: 10.1016/j.cor.2020.105185
William N. Caballero , Phillip R. Jenkins , Andrew J. Keith

Abstract The contemporary decision making environment is becoming increasingly more automated. Developments in artificial intelligence, machine learning, and operations research have increased the prevalence of computer systems in decision making tasks across a myriad of applications. Markov decision processes (MDPs) are utilized in a variety of system controllers, and attacks against them are of particular interest, even though this problem structure is relatively understudied in the adversarial learning literature. Therefore, in this research, we consider the finite-horizon MDP poisoning problem wherein an adversary perturbs a decision maker’s baseline MDP formulation to induce desired behavior while balancing the risk of attack detection. We formally define the associated mathematical programming formulation as a mixed-integer bilevel programming problem. We provide a single-level representation that can be handled by some commercial global solvers, but, since their performance is frequently inadequate, we develop gradient-based, gradient-free, and bifurcation heuristic solution methodologies that include self-tuning extensions. The performance of these algorithms is explored on a wide array of sample problem instances to determine their relative efficacy in terms of solution quality and computational effort for different finite-horizon MDP structures.

中文翻译：

设计时中毒的有限视野马尔可夫决策过程

摘要当代决策环境正变得越来越自动化。人工智能、机器学习和运筹学的发展增加了计算机系统在无数应用程序的决策任务中的普及。马尔可夫决策过程 (MDP) 被用于各种系统控制器，并且对它们的攻击特别令人感兴趣，尽管这种问题结构在对抗性学习文献中研究相对较少。因此，在这项研究中，我们考虑了有限范围 MDP 中毒问题，其中对手扰乱决策者的基线 MDP 公式以诱导所需的行为，同时平衡攻击检测的风险。我们将相关的数学规划公式正式定义为混合整数双层规划问题。我们提供了可以由一些商业全局求解器处理的单级表示，但是，由于它们的性能经常不足，我们开发了基于梯度、无梯度和分叉启发式解决方案方法，其中包括自调整扩展。这些算法的性能在广泛的样本问题实例上进行了探索，以确定它们在不同有限范围 MDP 结构的解决方案质量和计算工作量方面的相对效率。和分岔启发式解决方案方法，包括自调整扩展。这些算法的性能在广泛的样本问题实例上进行了探索，以确定它们在不同有限范围 MDP 结构的解决方案质量和计算工作量方面的相对效率。和分岔启发式解决方案方法，包括自调整扩展。这些算法的性能在广泛的样本问题实例上进行了探索，以确定它们在不同有限范围 MDP 结构的解决方案质量和计算工作量方面的相对效率。

更新日期：2021-05-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11