Policy Sharing using Aggregation Trees for Q-learning in a Continuous State and Action Spaces,IEEE Transactions on Cognitive and Developmental Systems

当前位置： X-MOL 学术 › IEEE Trans. Cogn. Dev. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Policy Sharing using Aggregation Trees for Q-learning in a Continuous State and Action Spaces
IEEE Transactions on Cognitive and Developmental Systems ( IF 5 ) Pub Date : 2020-09-01 , DOI: 10.1109/tcds.2019.2926477
Yu-Jen Chen , Wei-Cheng Jiang , Ming-Yi Ju , Kao-Shing Hwang

$Q$

-learning is a generic approach that uses a finite discrete state and an action domain to estimate action values using tabular or function approximation methods. An intelligent agent eventually learns policies from continuous sensory inputs and encodes these environmental inputs onto a discrete state space. The application of

$Q$

-learning in a continuous state/action domain is the subject of many studies. This paper uses a tree structure to approximate a

$Q$

-function using in a continuous state domain. The agent selects a discretized action with a maximum

$Q$

-value and this discretized action is then extended to a continuous action using an action bias function. Reinforcement learning is difficult for a single agent when the state space is huge. This proposed architecture is also applied to a multiagent system, wherein an individual agent transfers its useful

$Q$

-values to other agents to accelerate the learning process. Policy is shared between agents by grafting the branches of trees in which

$Q$

-values are stored to other trees. The results for simulation show that the proposed architecture performs better than tabular

$Q$

-learning and significantly accelerates the learning process because all agents use the sharing mechanisms to cooperate with each other.

中文翻译：