当前位置: X-MOL 学术Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Simplified Risk-aware Decision Making with Belief-dependent Rewards in Partially Observable Domains
Artificial Intelligence ( IF 14.4 ) Pub Date : 2022-08-27 , DOI: 10.1016/j.artint.2022.103775
Andrey Zhitnikov , Vadim Indelman

With the recent advent of risk awareness, decision-making algorithms' complexity increases, posing a severe difficulty to solve such formulations of the problem online. Our approach is centered on the distribution of the return in the challenging continuous domain under partial observability. This paper proposes a simplification framework to ease the computational burden while providing guarantees on the simplification impact. On top of this framework, we present novel stochastic bounds on the return that apply to any reward function. Further, we consider simplification's impact on decision making with risk averse objectives, which, to the best of our knowledge, has not been investigated thus far. In particular, we prove that stochastic bounds on the return yield deterministic bounds on Value at Risk. The second part of the paper focuses on the joint distribution of a pair of returns given a pair of candidate policies, thereby, for the first time, accounting for the correlation between these returns. Here, we propose a novel risk averse objective and apply our simplification paradigm. Moreover, we present a novel tool called the probabilistic loss (PLoss) to completely characterize the simplification impact for any objective operator in this setting. We provably bound the cumulative and tail distribution function of PLoss using PbLoss to provide such a characterization online using only the simplified problem. In addition, we utilize this tool to offer deterministic guarantees to the simplification in the context of our novel risk averse objective. We employ our proposed framework on a particular simplification technique - reducing the number of samples for reward calculation or belief representation within planning. Finally, we verify the advantages of our approach through extensive simulations.



中文翻译:

在部分可观察的领域中使用信念相关的奖励简化风险意识决策

随着最近风险意识的出现,决策算法的复杂性增加,这给在线解决此类问题的表述带来了严重困难。我们的方法集中在部分可观察性下具有挑战性的连续域中的回报分布。本文提出了一个简化框架来减轻计算负担,同时保证简化的影响。在这个框架之上,我们提出了适用于任何奖励函数的新颖的回报随机界限。此外,我们考虑了简化对具有风险规避目标的决策的影响,据我们所知,迄今为止尚未对其进行调查。特别是,我们证明了收益的随机界限会产生风险价值的确定界限。本文的第二部分侧重于给定一对候选策略的一对回报的联合分布,从而首次解释了这些回报之间的相关性。在这里,我们提出了一个新的风险规避目标并应用我们的简化范式。此外,我们提出了一种称为概率损失的新工具(PLoss ) 以完全表征此设置中任何目标算子的简化影响。我们证明了使用PbLoss绑定了PLoss的累积和尾部分布函数,以仅使用简化问题在线提供这样的表征。此外,我们利用这个工具在我们新颖的风险规避目标的背景下为简化提供确定性保证。我们将我们提出的框架应用于特定的简化技术——减少计划中奖励计算或信念表示的样本数量。最后,我们通过广泛的模拟验证了我们方法的优势。

更新日期:2022-08-27
down
wechat
bug