当前位置: X-MOL 学术Comput. Oper. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Risk-Sensitive Control of Markov Decision Processes: A Moment-Based Approach with Target Distributions
Computers & Operations Research ( IF 4.1 ) Pub Date : 2020-11-01 , DOI: 10.1016/j.cor.2020.104997
Rainer Schlosser

Abstract In many revenue management applications risk-averse decision-making is crucial. In dynamic settings, however, it is challenging to find the right balance between maximizing expected rewards and minimizing various kinds of risk. In existing approaches utility functions, chance constraints, or (conditional) value at risk considerations are used to influence the distribution of rewards in a preferred way. Nevertheless, common techniques are not flexible enough and typically numerically complex. In our model, we exploit the fact that a distribution is characterized by its mean and higher moments. We present a multi-valued dynamic programming heuristic to compute risk-sensitive feedback policies that are able to directly control the moments of future rewards. Our approach is based on recursive formulations of higher moments and does not require an extension of the state space. Finally, we propose a self-tuning algorithm, which allows to identify feedback policies that approximate predetermined (risk-sensitive) target distributions. We illustrate the effectiveness and the flexibility of our approach for different dynamic pricing scenarios.

中文翻译:

马尔可夫决策过程的风险敏感控制:具有目标分布的基于矩的方法

摘要 在许多收入管理应用中,规避风险的决策至关重要。然而,在动态环境中,在最大化预期回报和最小化各种风险之间找到适当的平衡是具有挑战性的。在现有方法中,效用函数、机会约束或(条件)风险考虑值用于以首选方式影响奖励的分配。然而,常用的技术不够灵活,而且通常在数值上很复杂。在我们的模型中,我们利用分布的特征在于其平均矩和高矩这一事实。我们提出了一种多值动态规划启发式方法来计算能够直接控制未来奖励时刻的风险敏感反馈策略。我们的方法基于更高矩的递归公式,不需要扩展状态空间。最后,我们提出了一种自调整算法,它允许识别近似预定(风险敏感)目标分布的反馈策略。我们说明了我们的方法对不同动态定价场景的有效性和灵活性。
更新日期:2020-11-01
down
wechat
bug