当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A kernel based learning method for non-stationary two-player repeated games
Knowledge-Based Systems ( IF 7.2 ) Pub Date : 2020-04-01 , DOI: 10.1016/j.knosys.2020.105820
Renan Motta Goulart , Saul C. Leite , Raul Fonseca Neto

Repeated games is a branch of game theory, where a game can be played several times by the players involved. In this setting, players may not always play the optimal strategy or they may be willing to engage in collaboration or other types of behavior which might lead to a higher long-term profit. Since the same game is repeated for several rounds, and considering a scenario with complete information, it is possible for a player to analyze its opponent’s behavior in order to find patterns. These patterns can then be used to predict the opponent’s actions. Such a setting, where players have mutual information about past moves and do not always play in equilibrium, leads naturally to non-stationary environments, where the players can frequently modify their strategies in order to get ahead in the game. In this work, we propose a novel algorithm based on a string kernel density estimation, which is capable of predicting the opponent’s actions in repeated games and can be used to optimize the player’s profit over time. The prediction is not limited to the next round action. It can also be used to predict a finite sequence of future rounds, which can be combined with a lookahead search scheme with limited depth. In the experiments section, it is shown that the proposed algorithm is able to learn and adapt rapidly, providing good results even if the opponent also adopts an adaptive strategy.



中文翻译:

基于核的非平稳两人重复游戏学习方法

重复游戏是游戏理论的一个分支,在该游戏中,参与的玩家可以玩几次游戏。在这种情况下,玩家可能不会总是发挥最佳策略,或者他们可能愿意参与协作或其他类型的行为,这可能会带来更高的长期利润。由于同一游戏重复进行了几回合,并考虑了具有完整信息的场景,因此玩家可以分析其对手的行为以找到模式。这些模式可以用来预测对手的行动。这样的环境中,玩家具有过去动作的相互信息,并且并不总是平衡地进行游戏,因此自然会导致出现不稳定的环境,在这种环境中,玩家可以经常修改其策略以在游戏中取得领先。在这项工作中 我们提出了一种基于字符串核密度估计的新颖算法,该算法能够预测重复游戏中对手的动作,并可以用来随着时间的推移优化玩家的利润。该预测不限于下一轮动作。它还可以用于预测未来回合的有限序列,可以将其与深度有限的超前搜索方案结合使用。在实验部分,表明了该算法能够快速学习和适应,即使对手也采用自适应策略也能提供良好的结果。它还可以用于预测未来回合的有限序列,可以将其与深度有限的超前搜索方案结合使用。在实验部分,表明了该算法能够快速学习和适应,即使对手也采用自适应策略也能提供良好的结果。它还可以用于预测未来回合的有限序列,可以将其与深度有限的超前搜索方案结合使用。在实验部分,表明了该算法能够快速学习和适应,即使对手也采用自适应策略也能提供良好的结果。

更新日期:2020-04-01
down
wechat
bug