当前位置: X-MOL 学术Int. J. Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
When blockchain meets AI: Optimal mining strategy achieved by machine learning
International Journal of Intelligent Systems ( IF 5.0 ) Pub Date : 2021-02-01 , DOI: 10.1002/int.22375
Taotao Wang 1 , Soung Chang Liew 2 , Shengli Zhang 1
Affiliation  

This study applies reinforcement learning (RL) from the AI machine learning field to derive an optimal Bitcoin‐like blockchain mining strategy. A salient feature of the RL learning framework is that an optimal (or near‐optimal) strategy can be obtained without knowing the details of the blockchain network model. Previously, the most profitable mining strategy was believed to be honest mining encoded in the default blockchain protocol. It was shown later that it is possible to gain more mining rewards by deviating from honest mining. In particular, the mining problem can be formulated as a Markov Decision Process (MDP) which can be solved to give the optimal mining strategy. However, solving the mining MDP requires knowing the values of various parameters that characterize the blockchain network model. In real blockchain networks, these parameter values are not easy to obtain and may change over time. This hinders the use of the MDP model‐based solution. In this study, we employ RL to dynamically learn a mining strategy with performance approaching that of the optimal mining strategy. Since the mining MDP problem has a nonlinear objective function (rather than linear functions of standard MDP problems), we design a new multidimensional RL algorithm to solve the problem. Experimental results indicate that, without knowing the parameter values of the mining MDP model, our multidimensional RL mining algorithm can still achieve optimal performance over time‐varying blockchain networks.

中文翻译:

当区块链遇到AI:通过机器学习实现的最佳挖掘策略

这项研究应用了来自AI机器学习领域的强化学习(RL),以得出最佳的类似于比特币的区块链挖掘策略。RL学习框架的显着特征是无需了解区块链网络模型的细节即可获得最佳(或接近最佳)的策略。以前,最赚钱的挖掘策略被认为是默认区块链协议中编码的诚实挖掘。后来证明,偏离诚实采矿有可能获得更多的采矿奖励。特别是,可以将挖掘问题公式化为马尔可夫决策过程(MDP),可以解决该问题,以提供最佳的挖掘策略。但是,求解挖掘MDP需要了解表征区块链网络模型的各种参数的值。在真实的区块链网络中 这些参数值不容易获得,并且可能会随时间变化。这阻碍了基于MDP模型的解决方案的使用。在这项研究中,我们采用RL来动态学习性能接近最佳采矿策略的采矿策略。由于挖掘MDP问题具有非线性目标函数(而不是标准MDP问题的线性函数),因此我们设计了一种新的多维RL算法来解决该问题。实验结果表明,在不了解挖掘MDP模型的参数值的情况下,我们的多维RL挖掘算法仍可以在随时间变化的区块链网络上实现最佳性能。我们采用RL来动态学习性能接近最佳采矿策略的采矿策略。由于挖掘MDP问题具有非线性目标函数(而不是标准MDP问题的线性函数),因此我们设计了一种新的多维RL算法来解决该问题。实验结果表明,在不了解挖掘MDP模型的参数值的情况下,我们的多维RL挖掘算法仍可以在随时间变化的区块链网络上实现最佳性能。我们采用RL来动态学习性能接近最佳采矿策略的采矿策略。由于挖掘MDP问题具有非线性目标函数(而不是标准MDP问题的线性函数),因此我们设计了一种新的多维RL算法来解决该问题。实验结果表明,在不了解挖掘MDP模型的参数值的情况下,我们的多维RL挖掘算法仍可以在随时间变化的区块链网络上实现最佳性能。
更新日期:2021-03-31
down
wechat
bug