当前位置: X-MOL 学术Neural Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Salience Interest Option: Temporal abstraction with salience interest functions
Neural Networks ( IF 7.8 ) Pub Date : 2024-04-25 , DOI: 10.1016/j.neunet.2024.106342
Xianchao Zhu , Liang Zhao , William Zhu

Reinforcement Learning (RL) is a significant machine learning subfield that emphasizes learning actions based on environment to obtain optimal behavior policy. RL agents can make decisions at variable time scales in the form of temporal abstractions, also known as options. The issue of discovering options has seen a considerable research effort. Most notably, the Interest Option Critic (IOC) algorithm first extends the initial set to the interest function, providing a method for learning options specialized to certain state space regions. This approach offers a specific attention mechanism for action selection. Unfortunately, this method still suffers from the classic issues of poor data efficiency and lack of flexibility in RL when learning options end-to-end through backpropagation. This paper proposes a new approach called Salience Interest Option Critic (SIOC), which chooses subsets of existing initiation sets for RL. Specifically, these subsets are not learned by backpropagation, which is slow and tends to overfit, but through particle filters. This approach enables the rapid and flexible identification of critical subsets using only reward feedback. We conducted experiments in discrete and continuous domains, and our proposed method demonstrate higher efficiency and flexibility than other methods. The generated options are more valuable within a single task and exhibited greater interpretability and reusability in multi-task learning scenarios.

中文翻译:

显着性兴趣选项:具有显着性兴趣函数的时间抽象

强化学习(RL)是一个重要的机器学习子领域,强调根据环境学习行动以获得最优行为策略。强化学习代理可以以时间抽象(也称为选项)的形式在可变的时间尺度上做出决策。发现选项的问题已经投入了大量的研究工作。最值得注意的是,兴趣选项批评家(IOC)算法首先将初始集扩展到兴趣函数,提供了一种学习专门针对某些状态空间区域的选项的方法。这种方法为动作选择提供了特定的注意机制。不幸的是,这种方法仍然存在数据效率差和强化学习在通过反向传播端到端学习选项时缺乏灵活性等经典问题。本文提出了一种称为 Salience Interest Option Critic (SIOC) 的新方法,它为 RL 选择现有启动集的子集。具体来说,这些子集不是通过反向传播学习的,反向传播速度很慢并且容易过度拟合,而是通过粒子滤波器学习。这种方法可以仅使用奖励反馈来快速灵活地识别关键子集。我们在离散和连续域中进行了实验,我们提出的方法比其他方法表现出更高的效率和灵活性。生成的选项在单个任务中更有价值,并且在多任务学习场景中表现出更大的可解释性和可重用性。
更新日期:2024-04-25
down
wechat
bug