Memory-Constrained No-Regret Learning in Adversarial Multi-Armed Bandits,IEEE Transactions on Signal Processing

当前位置： X-MOL 学术 › IEEE Trans. Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Memory-Constrained No-Regret Learning in Adversarial Multi-Armed Bandits
IEEE Transactions on Signal Processing ( IF 5.4 ) Pub Date : 2021-03-31 , DOI: 10.1109/tsp.2021.3070201
Xiao Xu , Qing Zhao

An adversarial multi-armed bandit problem with memory constraints is studied where the memory for storing arm statistics is only in a sublinear order of the number of arms. A hierarchical learning framework that offers a sequence of operating points on the tradeoff curve between the regret order and memory complexity is developed. Its sublinear regret orders are established under both weak regret and shifting regret notions. This work appears to be the first on memory-constrained bandit problems in the adversarial setting.

中文翻译：

对抗性多武装土匪的记忆受限无悔学习

研究了具有记忆约束的对抗式多臂匪问题，其中用于存储臂统计信息的存储器仅处于臂数的次线性顺序中。开发了一种分层学习框架，该框架在后悔顺序和内存复杂性之间的折衷曲线上提供了一系列操作点。它的次线性后悔顺序是在弱后悔和转移后悔概念下建立的。这项工作似乎是对抗环境下记忆受限的强盗问题的第一项工作。

更新日期：2021-04-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>