当前位置: X-MOL 学术Quantum › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-armed quantum bandits: Exploration versus exploitation when learning properties of quantum states
Quantum ( IF 6.4 ) Pub Date : 2022-06-29 , DOI: 10.22331/q-2022-06-29-749
Josep Lumbreras 1 , Erkka Haapasalo 1 , Marco Tomamichel 1, 2
Affiliation  

We initiate the study of tradeoffs between exploration and exploitation in online learning of properties of quantum states. Given sequential oracle access to an unknown quantum state, in each round, we are tasked to choose an observable from a set of actions aiming to maximize its expectation value on the state (the reward). Information gained about the unknown state from previous rounds can be used to gradually improve the choice of action, thus reducing the gap between the reward and the maximal reward attainable with the given action set (the regret). We provide various information-theoretic lower bounds on the cumulative regret that an optimal learner must incur, and show that it scales at least as the square root of the number of rounds played. We also investigate the dependence of the cumulative regret on the number of available actions and the dimension of the underlying space. Moreover, we exhibit strategies that are optimal for bandits with a finite number of arms and general mixed states.

中文翻译:

多臂量子土匪:学习量子态属性时的探索与利用

我们开始研究量子态属性在线学习中探索和利用之间的权衡。给定对未知量子状​​态的顺序预言机访问,在每一轮中,我们的任务是从一组旨在最大化其对状态的期望值(奖励)的动作中选择一个可观察的。从前几轮中获得的关于未知状态的信息可用于逐渐改进行动的选择,从而减少奖励与给定行动集可获得的最大奖励(遗憾)之间的差距。我们提供了最佳学习者必须产生的累积遗憾的各种信息论下限,并表明它至少与所玩轮数的平方根成比例。我们还调查了累积后悔对可用操作数量和潜在空间维度的依赖性。此外,我们展示了最适合具有有限数量的臂和一般混合状态的老虎机的策略。
更新日期:2022-06-29
down
wechat
bug