Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?,arXiv - CS - Hardware Architecture

当前位置： X-MOL 学术 › arXiv.cs.AR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?
arXiv - CS - Hardware Architecture Pub Date : 2021-06-05 , DOI: arxiv-2106.02855
S. V. Sai Santosh, Sumit J. Darak

Multi-armed Bandit (MAB) algorithms identify the best arm among multiple arms via exploration-exploitation trade-off without prior knowledge of arm statistics. Their usefulness in wireless radio, IoT, and robotics demand deployment on edge devices, and hence, a mapping on system-on-chip (SoC) is desired. Theoretically, the Bayesian approach-based Thompson Sampling (TS) algorithm offers better performance than the frequentist approach-based Upper Confidence Bound (UCB) algorithm. However, TS is not synthesizable due to Beta function. We address this problem by approximating it via a pseudo-random number generator-based approach and efficiently realize the TS algorithm on Zynq SoC. In practice, the type of arms distribution (e.g., Bernoulli, Gaussian, etc.) is unknown and hence, a single algorithm may not be optimal. We propose a reconfigurable and intelligent MAB (RI-MAB) framework. Here, intelligence enables the identification of appropriate MAB algorithms for a given environment, and reconfigurability allows on-the-fly switching between algorithms on the SoC. This eliminates the need for parallel implementation of algorithms resulting in huge savings in resources and power consumption. We analyze the functional correctness, area, power, and execution time of the proposed and existing architectures for various arm distributions, word-length, and hardware-software co-design approaches. We demonstrate the superiority of the RI-MAB over TS and UCB only architectures.

中文翻译：

片上系统上的多臂强盗算法：去频率论还是贝叶斯？

多臂强盗 (MAB) 算法通过探索-利用权衡确定多臂中的最佳臂，而无需事先了解臂统计信息。它们在无线电、物联网和机器人技术中的用途需要在边缘设备上进行部署，因此，需要在片上系统 (SoC) 上进行映射。从理论上讲，基于贝叶斯方法的汤普森采样 (TS) 算法比基于频率论方法的上置信界 (UCB) 算法提供更好的性能。然而，由于 Beta 功能，TS 不可合成。我们通过基于伪随机数生成器的方法对其进行近似来解决这个问题，并在 Zynq SoC 上有效地实现 TS 算法。实际上，臂分布的类型（例如，伯努利、高斯等）是未知的，因此，单个算法可能不是最佳的。我们提出了一个可重构和智能的 MAB (RI-MAB) 框架。在这里，智能可以为给定环境识别合适的 MAB 算法，而可重构性允许在 SoC 上的算法之间进行即时切换。这消除了并行实现算法的需要，从而大大节省了资源和功耗。我们针对各种臂分布、字长和软硬件协同设计方法，分析了所提出和现有架构的功能正确性、面积、功率和执行时间。我们证明了 RI-MAB 优于仅 TS 和 UCB 架构。和可重构性允许在 SoC 上的算法之间进行即时切换。这消除了并行实现算法的需要，从而大大节省了资源和功耗。我们针对各种臂分布、字长和软硬件协同设计方法，分析了所提出和现有架构的功能正确性、面积、功率和执行时间。我们证明了 RI-MAB 优于仅 TS 和 UCB 架构。和可重构性允许在 SoC 上的算法之间进行即时切换。这消除了并行实现算法的需要，从而大大节省了资源和功耗。我们针对各种臂分布、字长和软硬件协同设计方法，分析了所提出和现有架构的功能正确性、面积、功率和执行时间。我们证明了 RI-MAB 优于仅 TS 和 UCB 架构。

更新日期：2021-06-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文