Q-learning for estimating optimal dynamic treatment rules from observational data.,The Canadian Journal of Statistics

当前位置： X-MOL 学术 › Can. J. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Q-learning for estimating optimal dynamic treatment rules from observational data.
The Canadian Journal of Statistics ( IF 0.8 ) Pub Date : 2012-11-07 , DOI: 10.1002/cjs.11162
Erica E M Moodie ₁ , Bibhas Chakraborty , Michael S Kramer

Affiliation

The area of dynamic treatment regimes (DTR) aims to make inference about adaptive, multistage decision‐making in clinical practice. A DTR is a set of decision rules, one per interval of treatment, where each decision is a function of treatment and covariate history that returns a recommended treatment. Q‐learning is a popular method from the reinforcement learning literature that has recently been applied to estimate DTRs. While, in principle, Q‐learning can be used for both randomized and observational data, the focus in the literature thus far has been exclusively on the randomized treatment setting. We extend the method to incorporate measured confounding covariates, using direct adjustment and a variety of propensity score approaches. The methods are examined under various settings including non‐regular scenarios. We illustrate the methods in examining the effect of breastfeeding on vocabulary testing, based on data from the Promotion of Breastfeeding Intervention Trial. The Canadian Journal of Statistics 40: 629–645; 2012 © 2012 Statistical Society of Canada

中文翻译：

用于根据观察数据估计最佳动态治疗规则的 Q 学习。

动态治疗方案（DTR）领域旨在对临床实践中的适应性、多阶段决策进行推断。 DTR 是一组决策规则，每个治疗间隔一个，其中每个决策都是治疗和返回推荐治疗的协变量历史的函数。 Q-learning 是强化学习文献中的一种流行方法，最近被应用于估计 DTR。虽然原则上 Q-learning 可用于随机数据和观察数据，但迄今为止文献的重点仅集中在随机治疗设置上。我们使用直接调整和各种倾向评分方法扩展该方法以纳入测量的混杂协变量。这些方法在包括非常规场景在内的各种设置下进行了检查。我们根据母乳喂养干预促进试验的数据说明了检查母乳喂养对词汇测试影响的方法。加拿大统计杂志40：629–645； 2012 © 2012 加拿大统计学会

更新日期：2012-11-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文