当前位置:
X-MOL 学术
›
Int. J. Artif. Intell. Tools
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Criticality-based Varying Step-number Algorithm for Reinforcement Learning
International Journal on Artificial Intelligence Tools ( IF 1.1 ) Pub Date : 2021-06-30 , DOI: 10.1142/s0218213021500196 Yitzhak Spielberg 1 , Amos Azaria 1
International Journal on Artificial Intelligence Tools ( IF 1.1 ) Pub Date : 2021-06-30 , DOI: 10.1142/s0218213021500196 Yitzhak Spielberg 1 , Amos Azaria 1
Affiliation
In the context of reinforcement learning we introduce the concept of criticality of a state, which indicates the extent to which the choice of action in that particular state influences the expected return. That is, a state in which the choice of action is more likely to influence the final outcome is considered as more critical than a state in which it is less likely to influence the final outcome.
We formulate a criticality-based varying step number algorithm (CVS) — a flexible step number algorithm that utilizes the criticality function provided by a human, or learned directly from the environment. We test it in three different domains including the Atari Pong environment, Road-Tree environment, and Shooter environment. We demonstrate that CVS is able to outperform popular learning algorithms such as Deep Q-Learning and Monte Carlo.
中文翻译:
用于强化学习的基于临界度的变步数算法
在强化学习的背景下,我们引入了状态临界性的概念,它表示在该特定状态下的行动选择对预期回报的影响程度。也就是说,行动选择更可能影响最终结果的状态被认为比它不太可能影响最终结果的状态更关键。我们制定了一种基于关键性的可变步数算法 (CVS)——一种灵活的步数算法,它利用人类提供的关键性函数,或直接从环境中学习。我们在三个不同的领域进行测试,包括 Atari Pong 环境、Road-Tree 环境和 Shooter 环境。我们证明 CVS 能够胜过流行的学习算法,例如 Deep Q-Learning 和 Monte Carlo。
更新日期:2021-06-30
中文翻译:
用于强化学习的基于临界度的变步数算法
在强化学习的背景下,我们引入了状态临界性的概念,它表示在该特定状态下的行动选择对预期回报的影响程度。也就是说,行动选择更可能影响最终结果的状态被认为比它不太可能影响最终结果的状态更关键。我们制定了一种基于关键性的可变步数算法 (CVS)——一种灵活的步数算法,它利用人类提供的关键性函数,或直接从环境中学习。我们在三个不同的领域进行测试,包括 Atari Pong 环境、Road-Tree 环境和 Shooter 环境。我们证明 CVS 能够胜过流行的学习算法,例如 Deep Q-Learning 和 Monte Carlo。