当前位置: X-MOL 学术J. Am. Stat. Assoc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Bayesian Machine Learning Approach for Optimizing Dynamic Treatment Regimes
Journal of the American Statistical Association ( IF 3.0 ) Pub Date : 2018-07-03 , DOI: 10.1080/01621459.2017.1340887
Thomas A Murray 1 , Ying Yuan 1 , Peter F Thall 1
Affiliation  

ABSTRACT Medical therapy often consists of multiple stages, with a treatment chosen by the physician at each stage based on the patient’s history of treatments and clinical outcomes. These decisions can be formalized as a dynamic treatment regime. This article describes a new approach for optimizing dynamic treatment regimes, which bridges the gap between Bayesian inference and existing approaches, like Q-learning. The proposed approach fits a series of Bayesian regression models, one for each stage, in reverse sequential order. Each model uses as a response variable the remaining payoff assuming optimal actions are taken at subsequent stages, and as covariates the current history and relevant actions at that stage. The key difficulty is that the optimal decision rules at subsequent stages are unknown, and even if these decision rules were known the relevant response variables may be counterfactual. However, posterior distributions can be derived from the previously fitted regression models for the optimal decision rules and the counterfactual response variables under a particular set of rules. The proposed approach averages over these posterior distributions when fitting each regression model. An efficient sampling algorithm for estimation is presented, along with simulation studies that compare the proposed approach with Q-learning. Supplementary materials for this article are available online.

中文翻译:

一种优化动态治疗方案的贝叶斯机器学习方法

摘要 药物治疗通常由多个阶段组成,医生根据患者的治疗史和临床结果在每个阶段选择一种治疗方法。这些决定可以正式化为动态治疗方案。本文描述了一种优化动态治疗方案的新方法,它弥合了贝叶斯推理与现有方法(如 Q-learning)之间的差距。所提出的方法以相反的顺序拟合一系列贝叶斯回归模型,每个阶段一个。每个模型使用剩余收益作为响应变量,假设在后续阶段采取最佳行动,并将当前历史和相关行动作为协变量。关键难点在于后续阶段的最优决策规则未知,即使这些决策规则是已知的,相关的响应变量也可能是反事实的。然而,后验分布可以从先前拟合的最优决策规则的回归模型和一组特定规则下的反事实响应变量中推导出来。建议的方法在拟合每个回归模型时对这些后验分布求平均值。提出了一种用于估计的有效采样算法,以及将所提出的方法与 Q-learning 进行比较的模拟研究。本文的补充材料可在线获取。建议的方法在拟合每个回归模型时对这些后验分布求平均值。提出了一种用于估计的有效采样算法,以及将所提出的方法与 Q-learning 进行比较的模拟研究。本文的补充材料可在线获取。建议的方法在拟合每个回归模型时对这些后验分布求平均值。提出了一种用于估计的有效采样算法,以及将所提出的方法与 Q-learning 进行比较的模拟研究。本文的补充材料可在线获取。
更新日期:2018-07-03
down
wechat
bug