Proposal and evaluation of deep exploitation-oriented learning under multiple reward environment,Cognitive Systems Research

当前位置： X-MOL 学术 › Cogn. Syst. Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Proposal and evaluation of deep exploitation-oriented learning under multiple reward environment
Cognitive Systems Research ( IF 2.1 ) Pub Date : 2021-07-16 , DOI: 10.1016/j.cogsys.2021.07.002
Kazuteru Miyazaki ₁

Affiliation

Recently, deep reinforcement learning (DRL) has attracted considerable attention. The well-known deep Q-network (DQN) architecture successfully combines deep learning and Q-learning which is a representative reinforcement learning (RL) method. In general, RL and DRL require many trial-and-error searches. To overcome this limitation, alternative approaches called exploitation-oriented learning (XoL) and deep exploitation-oriented learning (DXoL) have been proposed.

Although the effectiveness of DXoL for DQNs has been verified, its effectiveness in an environment where multiple types of rewards are present remains unclear. In this study, we apply the DXoL method to two applications with multiple reward types: the driver drowsiness determination system and the decision-making system. Our experimental results show that DXoL is more suitable for learning priorities among multiple rewards than DQNs in these applications.

中文翻译：

多奖励环境下深度开发导向学习的建议与评价

最近，深度强化学习（DRL）引起了相当大的关注。著名的深度 Q 网络 (DQN) 架构成功地结合了深度学习和 Q-learning，这是一种具有代表性的强化学习 (RL) 方法。一般来说，RL 和 DRL 需要多次试错搜索。为了克服这一限制，已经提出了称为面向开发的学习（XoL）和面向深度开发的学习（DXoL）的替代方法。

尽管 DXoL 对 DQN 的有效性已得到验证，但其在存在多种类型奖励的环境中的有效性仍不清楚。在这项研究中，我们将 DXoL 方法应用于具有多种奖励类型的两个应用程序：驾驶员睡意确定系统和决策系统。我们的实验结果表明，在这些应用中，DXoL 比 DQN 更适合学习多个奖励中的优先级。

更新日期：2021-07-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11