当前位置: X-MOL 学术J. Math. Econ. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Constrained no-regret learning
Journal of Mathematical Economics ( IF 1.3 ) Pub Date : 2020-05-01 , DOI: 10.1016/j.jmateco.2020.02.002
Ye Du , Ehud Lehrer

Abstract We investigate a dynamic decision making problem with constraints. The decision maker is free to take any action as long as the empirical frequency of the actions played does not violate pre-specified constraints. In a case of violation the decision maker is penalized. We introduce the constrained no-regret learning model. In this model the set of alternative strategies, with which a dynamic decision policy is compared, is the set of stationary mixed actions that satisfy all the constraints. We show that there exists a strategy that satisfies the following properties: (i) it guarantees that after an unavoidable deterministic grace period, there are absolutely no violations; (ii) for an arbitrarily small constant ϵ > 0 , it achieves a convergence rate of T − 1 − ϵ 2 , which improves the O ( T − 1 3 ) convergence rate of Mannor et al. (2009).

中文翻译:

受约束的无悔学习

摘要 我们研究了一个有约束的动态决策问题。只要所采取行动的经验频率不违反预先指定的约束,决策者就可以自由采取任何行动。在违规的情况下,决策者将受到处罚。我们介绍了受约束的无悔学习模型。在该模型中,与动态决策策略进行比较的替代策略集是满足所有约束的静态混合动作集。我们表明存在满足以下属性的策略:(i)它保证在不可避免的确定性宽限期之后,绝对没有违规;(ii) 对于任意小的常数 ϵ > 0 ,它实现了 T − 1 − ϵ 2 的收敛速度,这提高了 Mannor 等人的 O ( T − 1 3 ) 收敛速度。
更新日期:2020-05-01
down
wechat
bug