当前位置:
X-MOL 学术
›
arXiv.cs.GT
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Learning to Personalize Treatments When Agents Are Strategic
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-11-12 , DOI: arxiv-2011.06528 Evan Munro
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-11-12 , DOI: arxiv-2011.06528 Evan Munro
There is increasing interest in using observed individual-level data to
formulate personalized policy. Examples of this include heterogeneous pricing,
individualized credit offers, and targeted social programs. This paper provides
a general model of how personalized policy creates incentives for individuals
to modify their behavior to obtain a better treatment. For a given planner
objective, we show that standard estimators based on repeated risk minimization
produce a suboptimal policy. We propose a dynamic experiment that estimates the
optimal treatment allocation function when agents are strategic and has regret
that decays at a linear rate. A key insight is that random variation in how
treatment assignment depends on observed characteristics is required, and that
randomized treatment assignment alone is not sufficient to identify the optimal
policy. We show this experimental method outperforms alternative methods that
do not learn strategic effects in simulations and in a small MTurk experiment.
中文翻译:
当代理人具有战略意义时学习个性化治疗
使用观察到的个人级别数据来制定个性化政策的兴趣越来越大。这方面的例子包括异构定价、个性化信贷优惠和有针对性的社交计划。本文提供了一个关于个性化政策如何激励个人改变行为以获得更好治疗的一般模型。对于给定的计划者目标,我们表明基于重复风险最小化的标准估计会产生次优策略。我们提出了一个动态实验,当代理具有策略性并且后悔以线性速率衰减时,该实验估计最佳治疗分配函数。一个关键的见解是,治疗分配如何取决于观察到的特征的随机变化是必需的,仅凭随机化的治疗分配不足以确定最佳策略。我们展示了这种实验方法优于在模拟和小型 MTurk 实验中不学习战略效果的替代方法。
更新日期:2020-11-13
中文翻译:
当代理人具有战略意义时学习个性化治疗
使用观察到的个人级别数据来制定个性化政策的兴趣越来越大。这方面的例子包括异构定价、个性化信贷优惠和有针对性的社交计划。本文提供了一个关于个性化政策如何激励个人改变行为以获得更好治疗的一般模型。对于给定的计划者目标,我们表明基于重复风险最小化的标准估计会产生次优策略。我们提出了一个动态实验,当代理具有策略性并且后悔以线性速率衰减时,该实验估计最佳治疗分配函数。一个关键的见解是,治疗分配如何取决于观察到的特征的随机变化是必需的,仅凭随机化的治疗分配不足以确定最佳策略。我们展示了这种实验方法优于在模拟和小型 MTurk 实验中不学习战略效果的替代方法。