Statistical Inference for Online Decision Making via Stochastic Gradient Descent,Journal of the American Statistical Association

当前位置： X-MOL 学术 › J. Am. Stat. Assoc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Statistical Inference for Online Decision Making via Stochastic Gradient Descent
Journal of the American Statistical Association ( IF 3.0 ) Pub Date : 2020-11-19 , DOI: 10.1080/01621459.2020.1826325
Haoyu Chen ₁ , Wenbin Lu ₁ , Rui Song ₁

Affiliation

Online decision making aims to learn the optimal decision rule by making personalized decisions and updating the decision rule recursively. It has become easier than before with the help of big data, but new challenges also come along. Since the decision rule should be updated once per step, an offline update which uses all the historical data is inefficient in computation and storage. To this end, we propose a completely online algorithm that can make decisions and update the decision rule online via stochastic gradient descent. It is not only efficient but also supports all kinds of parametric reward models. Focusing on the statistical inference of online decision making, we establish the asymptotic normality of the parameter estimator produced by our algorithm and the online inverse probability weighted value estimator we used to estimate the optimal value. Online plugin estimators for the variance of the parameter and value estimators are also provided and shown to be consistent, so that interval estimation and hypothesis test are possible using our method. The proposed algorithm and theoretical results are tested by simulations and a real data application to news article recommendation.

中文翻译：

通过随机梯度下降进行在线决策的统计推断

在线决策旨在通过个性化决策和递归更新决策规则来学习最优决策规则。在大数据的帮助下，它变得比以前更容易了，但新的挑战也随之而来。由于决策规则应该每一步更新一次，使用所有历史数据的离线更新在计算和存储上效率低下。为此，我们提出了一种完全在线的算法，可以通过随机梯度下降在线进行决策和更新决策规则。它不仅高效，而且支持各种参数化奖励模型。专注于在线决策的统计推断，我们建立了我们算法产生的参数估计器的渐近正态性和我们用来估计最优值的在线逆概率加权值估计器。还提供了用于参数方差和值估计器的在线插件估计器并显示它们是一致的，因此可以使用我们的方法进行区间估计和假设检验。所提出的算法和理论结果通过模拟和真实数据在新闻文章推荐中的应用进行了测试。

更新日期：2020-11-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11