Arm order recognition in multi-armed bandit problem with laser chaos time series,arXiv - CS - Emerging Technologies

当前位置： X-MOL 学术 › arXiv.cs.ET › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Arm order recognition in multi-armed bandit problem with laser chaos time series
arXiv - CS - Emerging Technologies Pub Date : 2020-05-26 , DOI: arxiv-2005.13085
Naoki Narisawa, Nicolas Chauvet, Mikio Hasegawa and Makoto Naruse

By exploiting ultrafast and irregular time series generated by lasers with delayed feedback, we have previously demonstrated a scalable algorithm to solve multi-armed bandit (MAB) problems utilizing the time-division multiplexing of laser chaos time series. Although the algorithm detects the arm with the highest reward expectation, the correct recognition of the order of arms in terms of reward expectations is not achievable. Here, we present an algorithm where the degree of exploration is adaptively controlled based on confidence intervals that represent the estimation accuracy of reward expectations. We have demonstrated numerically that our approach did improve arm order recognition accuracy significantly, along with reduced dependence on reward environments, and the total reward is almost maintained compared with conventional MAB methods. This study applies to sectors where the order information is critical, such as efficient allocation of resources in information and communications technology.

中文翻译：

基于激光混沌时间序列的多臂老虎机问题的臂序识别

通过利用具有延迟反馈的激光器产生的超快和不规则时间序列，我们之前已经展示了一种可扩展的算法，利用激光混沌时间序列的时分复用来解决多臂老虎机 (MAB) 问题。虽然算法检测到奖励期望最高的手臂，但无法实现对奖励期望的手臂顺序的正确识别。在这里，我们提出了一种算法，其中基于表示奖励期望的估计准确性的置信区间自适应地控制探索程度。我们已经在数值上证明了我们的方法确实显着提高了手臂订单识别的准确性，同时减少了对奖励环境的依赖，与传统的 MAB 方法相比，总奖励几乎保持不变。

更新日期：2020-05-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>