当前位置: X-MOL 学术IEEE Trans. Cognit. Commun. Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bayesian Reinforcement Learning and Bayesian Deep Learning for Blockchains with Mobile Edge Computing
IEEE Transactions on Cognitive Communications and Networking ( IF 7.4 ) Pub Date : 2020-01-01 , DOI: 10.1109/tccn.2020.2994366
Alia Asheralieva , Dusit Niyato

We present a novel game-theoretic, Bayesian reinforce-ment learning (RL) and deep learning (DL) framework to represent interactions of miners in public and consortium blockchains with mobile edge computing (MEC). Within the framework, we formulate a stochastic game played by miners under incomplete information. Each miner can offload its block operations to one of the base stations (BSs) equipped with the MEC server. The miners select their offloading BSs and block processing rates simultaneously and independently, without informing other miners about their actions. As such, no miner knows the past and current actions of others and, hence, constructs its belief about these actions. Accordingly, we devise a Bayesian RL algorithm based on the partially-observable Markov decision process for miner’s decision making that allows each miner to dynamically adjust its strategy and update its beliefs through repeated interactions with each other and with the mobile environment. We also propose a novel unsupervised Bayesian deep learning algorithm where the uncertainties about unobservable states are approximated with Bayesian neural networks. We show that the proposed Bayesian RL and DL algorithms converge to the stable states where the miners’ actions and beliefs form the perfect Bayesian equilibrium (PBE) and myopic PBE, respectively.

中文翻译:

具有移动边缘计算的区块链的贝叶斯强化学习和贝叶斯深度学习

我们提出了一种新颖的博弈论、贝叶斯强化学习 (RL) 和深度学习 (DL) 框架来表示公共和联盟区块链中矿工与移动边缘计算 (MEC) 的交互。在该框架内,我们制定了矿工在不完全信息下进行的随机博弈。每个矿工都可以将其区块操作卸载到配备 MEC 服务器的基站 (BS) 之一。矿工同时独立地选择他们的卸载 BS 和块处理率,而不通知其他矿工他们的行为。因此,没有矿工知道其他人过去和现在的行为,并因此构建其对这些行为的信念。因此,我们设计了一种基于部分可观察马尔可夫决策过程的贝叶斯 RL 算法,用于矿工的决策制定,允许每个矿工通过相互之间以及与移动环境的重复交互来动态调整其策略并更新其信念。我们还提出了一种新的无监督贝叶斯深度学习算法,其中不可观察状态的不确定性用贝叶斯神经网络近似。我们表明,所提出的贝叶斯 RL 和 DL 算法收敛到稳定状态,其中矿工的行为和信念分别形成了完美的贝叶斯均衡 (PBE) 和近视 PBE。我们还提出了一种新的无监督贝叶斯深度学习算法,其中不可观察状态的不确定性用贝叶斯神经网络近似。我们表明,所提出的贝叶斯 RL 和 DL 算法收敛到稳定状态,其中矿工的行为和信念分别形成了完美的贝叶斯均衡 (PBE) 和近视 PBE。我们还提出了一种新的无监督贝叶斯深度学习算法,其中不可观察状态的不确定性用贝叶斯神经网络近似。我们表明,所提出的贝叶斯 RL 和 DL 算法收敛到稳定状态,其中矿工的行为和信念分别形成了完美的贝叶斯均衡 (PBE) 和近视 PBE。
更新日期:2020-01-01
down
wechat
bug