当前位置: X-MOL 学术Eng. Appl. Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Interpretable policies for reinforcement learning by empirical fuzzy sets
Engineering Applications of Artificial Intelligence ( IF 7.5 ) Pub Date : 2020-02-27 , DOI: 10.1016/j.engappai.2020.103559
Jianfeng Huang , Plamen P. Angelov , Chengliang Yin

This paper proposes a method and an algorithm to implement interpretable fuzzy reinforcement learning (IFRL). It provides alternative solutions to common problems in RL, like function approximation and continuous action space. The learning process resembles that of human beings by clustering the encountered states, developing experiences for each of the typical cases, and making decisions fuzzily. The learned policy can be expressed as human-intelligible IF-THEN rules, which facilitates further investigation and improvement. It adopts the actor–critic architecture whereas being different from mainstream policy gradient methods. The value function is approximated through the fuzzy system AnYa. The state–action space is discretized into a static grid with nodes. Each node is treated as one prototype and corresponds to one fuzzy rule, with the value of the node being the consequent. Values of consequents are updated using the Sarsa(λ) algorithm. Probability distribution of optimal actions regarding different states is estimated through Empirical Data Analytics (EDA), Autonomous Learning Multi-Model Systems (ALMMo), and Empirical Fuzzy Sets (εFS). The fuzzy kernel of IFRL avoids the lack of interpretability in other methods based on neural networks. Simulation results with four problems, namely Mountain Car, Continuous Gridworld, Pendulum Position, and Tank Level Control, are presented as a proof of the proposed concept.



中文翻译:

经验模糊集可解释的强化学习策略

本文提出了一种实现可解释的模糊强化学习(IFRL)的方法和算法。它为RL中的常见问题提供了替代解决方案,例如函数逼近和连续动作空间。学习过程类似于人类,其学习过程是将遇到的状态聚类,为每个典型案例开发经验,并模糊地做出决策。可以将学习到的策略表示为易于理解的IF-THEN规则,这有助于进一步调查和改进。它采用行为者-批评架构,但不同于主流的政策梯度方法。通过模糊系统AnYa近似值函数。状态动作空间被离散成带有节点的静态网格。每个节点都被视为一个原型,并且对应一个模糊规则,结果是节点的值。结果值使用Sarsa(λ)算法。通过经验数据分析(EDA),自主学习多模型系统(ALMMo)和经验模糊集(EDA)估算出关于不同状态的最佳动作的概率分布(εFS)。IFRL的模糊内核避免了其他基于神经网络的方法缺乏可解释性。提出了带有四个问题的仿真结果,即山地车,连续网格世界,摆位置和坦克水位控制,以证明所提出的概念。

更新日期:2020-02-27
down
wechat
bug