当前位置: X-MOL 学术Complex Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis
Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2020-07-25 , DOI: 10.1007/s40747-020-00175-y
Hengzhe Zhang , Aimin Zhou , Xin Lin

Reinforcement learning based on the deep neural network has attracted much attention and has been widely used in real-world applications. However, the black-box property limits its usage from applying in high-stake areas, such as manufacture and healthcare. To deal with this problem, some researchers resort to the interpretable control policy generation algorithm. The basic idea is to use an interpretable model, such as tree-based genetic programming, to extract policy from other black box modes, such as neural networks. Following this idea, in this paper, we try yet another form of the genetic programming technique, evolutionary feature synthesis, to extract control policy from the neural network. We also propose an evolutionary method to optimize the operator set of the control policy for each specific problem automatically. Moreover, a policy simplification strategy is also introduced. We conduct experiments on four reinforcement learning environments. The experiment results reveal that evolutionary feature synthesis can achieve better performance than tree-based genetic programming to extract policy from the neural network with comparable interpretability.



中文翻译:

基于演化特征综合的可解释性强化学习策略推导

基于深度神经网络的强化学习引起了广泛的关注,并已在实际应用中广泛使用。但是,黑匣子属性限制了它在高风险区域(如制造和医疗保健)中使用的用途。为了解决这个问题,一些研究人员诉诸于可解释的控制策略生成算法。基本思想是使用可解释的模型(例如基于树的遗传编程)从其他黑匣子模式(例如神经网络)中提取策略。遵循这个想法,在本文中,我们尝试了另一种形式的遗传编程技术,即进化特征合成,以从神经网络中提取控制策略。我们还提出了一种进化方法来针对每个特定问题自动优化控制策略的操作员集。此外,还引入了政策简化策略。我们在四个强化学习环境上进行实验。实验结果表明,与基于树的遗传规划相比,进化特征综合具有更好的性能,可以从神经网络中提取具有可解释性的策略。

更新日期:2020-07-25
down
wechat
bug