当前位置: X-MOL 学术Neural Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An empirical evaluation of active inference in multi-armed bandits
Neural Networks ( IF 7.8 ) Pub Date : 2021-08-26 , DOI: 10.1016/j.neunet.2021.08.018
Dimitrije Marković 1 , Hrvoje Stojić 2 , Sarah Schwöbel 3 , Stefan J Kiebel 1
Affiliation  

A key feature of sequential decision making under uncertainty is a need to balance between exploiting—choosing the best action according to the current knowledge, and exploring—obtaining information about values of other actions. The multi-armed bandit problem, a classical task that captures this trade-off, served as a vehicle in machine learning for developing bandit algorithms that proved to be useful in numerous industrial applications. The active inference framework, an approach to sequential decision making recently developed in neuroscience for understanding human and animal behaviour, is distinguished by its sophisticated strategy for resolving the exploration–exploitation trade-off. This makes active inference an exciting alternative to already established bandit algorithms. Here we derive an efficient and scalable approximate active inference algorithm and compare it to two state-of-the-art bandit algorithms: Bayesian upper confidence bound and optimistic Thompson sampling. This comparison is done on two types of bandit problems: a stationary and a dynamic switching bandit. Our empirical evaluation shows that the active inference algorithm does not produce efficient long-term behaviour in stationary bandits. However, in the more challenging switching bandit problem active inference performs substantially better than the two state-of-the-art bandit algorithms. The results open exciting venues for further research in theoretical and applied machine learning, as well as lend additional credibility to active inference as a general framework for studying human and animal behaviour.



中文翻译:

多臂匪徒主动推理的实证评价

在不确定性下进行顺序决策的一个关键特征是需要在利用(根据当前知识选择最佳行动)和探索(获取有关其他行动的价值的信息)之间取得平衡。多臂老虎机问题是捕捉这种权衡的经典任务,可作为机器学习的工具,用于开发被证明在众多工业应用中有用的老虎机算法。主动推理框架是神经科学中最近开发的一种用于理解人类和动物行为的顺序决策方法,以其解决探索-利用权衡的复杂策略而著称。这使得主动推理成为已经建立的老虎机算法的一个令人兴奋的替代方案。在这里,我们推导出一种高效且可扩展的近似主动推理算法,并将其与两种最先进的老虎机算法进行比较:贝叶斯置信上限和乐观汤普森采样。这种比较是针对两种类型的老虎机问题进行的:静态和动态切换老虎机。我们的经验评估表明,主动推理算法不会在固定老虎机中产生有效的长期行为。然而,在更具挑战性的切换老虎机问题中,主动推理的性能明显优于两种最先进的老虎机算法。结果为理论和应用机器学习的进一步研究开辟了令人兴奋的场所,并为作为研究人类和动物行为的一般框架的主动推理提供了额外的可信度。

更新日期:2021-09-08
down
wechat
bug