当前位置: X-MOL 学术arXiv.cs.NE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Interpretable-AI Policies using Evolutionary Nonlinear Decision Trees for Discrete Action Systems
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2020-09-20 , DOI: arxiv-2009.09521
Yashesh Dhebar, Kalyanmoy Deb, Subramanya Nageshrao, Ling Zhu and Dimitar Filev

Black-box artificial intelligence (AI) induction methods such as deep reinforcement learning (DRL) are increasingly being used to find optimal policies for a given control task. Although policies represented using a black-box AI are capable of efficiently executing the underlying control task and achieving optimal closed-loop performance -- controlling the agent from initial time step until the successful termination of an episode, the developed control rules are often complex and neither interpretable nor explainable. In this paper, we use a recently proposed nonlinear decision-tree (NLDT) approach to find a hierarchical set of control rules in an attempt to maximize the open-loop performance for approximating and explaining the pre-trained black-box DRL (oracle) agent using the labelled state-action dataset. Recent advances in nonlinear optimization approaches using evolutionary computation facilitates finding a hierarchical set of nonlinear control rules as a function of state variables using a computationally fast bilevel optimization procedure at each node of the proposed NLDT. Additionally, we propose a re-optimization procedure for enhancing closed-loop performance of an already derived NLDT. We evaluate our proposed methodologies on four different control problems having two to four discrete actions. In all these problems our proposed approach is able to find simple and interpretable rules involving one to four non-linear terms per rule, while simultaneously achieving on par closed-loop performance when compared to a trained black-box DRL agent. The obtained results are inspiring as they suggest the replacement of complicated black-box DRL policies involving thousands of parameters (making them non-interpretable) with simple interpretable policies. Results are encouraging and motivating to pursue further applications of proposed approach in solving more complex control tasks.

中文翻译:

将进化非线性决策树用于离散动作系统的可解释 AI 策略

诸如深度强化学习 (DRL) 之类的黑盒人工智能 (AI) 归纳方法越来越多地用于为给定的控制任务寻找最佳策略。尽管使用黑盒 AI 表示的策略能够有效地执行底层控制任务并实现最佳闭环性能——从初始时间步控制代理直到成功终止一个事件,但开发的控制规则通常很复杂且既不可解释也不可解释。在本文中,我们使用最近提出的非线性决策树 (NLDT) 方法来找到一组分层的控制规则,以尝试最大化近似和解释预训练黑盒 DRL (oracle) 的开环性能代理使用标记的状态-动作数据集。使用进化计算的非线性优化方法的最新进展有助于在所提议的 NLDT 的每个节点使用计算快速的双层优化程序,找到作为状态变量函数的非线性控制规则的分层集。此外,我们提出了一种重新优化程序,用于增强已经导出的 NLDT 的闭环性能。我们在具有两到四个离散动作的四个不同控制问题上评估我们提出的方法。在所有这些问题中,我们提出的方法能够找到简单且可解释的规则,每个规则涉及一到四个非线性项,同时与训练有素的黑盒 DRL 代理相比,实现同等的闭环性能。获得的结果令人鼓舞,因为它们建议用简单的可解释策略替换涉及数千个参数(使其不可解释)的复杂黑盒 DRL 策略。结果令人鼓舞并激励人们进一步应用所提出的方法来解决更复杂的控制任务。
更新日期:2020-09-22
down
wechat
bug