Automatic landmark discovery for learning agents under partial observability,The Knowledge Engineering Review

当前位置： X-MOL 学术 › Knowl. Eng. Rev. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automatic landmark discovery for learning agents under partial observability
The Knowledge Engineering Review ( IF 2.1 ) Pub Date : 2019-08-05 , DOI: 10.1017/s026988891900002x
Alper Demіr , Erkіn Çіlden , Faruk Polat

In the reinforcement learning context, a landmark is a compact information which uniquely couples a state, for problems with hidden states. Landmarks are shown to support finding good memoryless policies for Partially Observable Markov Decision Processes (POMDP) which contain at least one landmark. SarsaLandmark, as an adaptation of Sarsa(λ), is known to promise a better learning performance with the assumption that all landmarks of the problem are known in advance. In this paper, we propose a framework built upon SarsaLandmark, which is able to automatically identify landmarks within the problem during learning without sacrificing quality, and requiring no prior information about the problem structure. For this purpose, the framework fuses SarsaLandmark with a well-known multiple-instance learning algorithm, namely Diverse Density (DD). By further experimentation, we also provide a deeper insight into our concept filtering heuristic to accelerate DD, abbreviated as DDCF (Diverse Density with Concept Filtering), which proves itself to be suitable for POMDPs with landmarks. DDCF outperforms its antecedent in terms of computation speed and solution quality without loss of generality. The methods are empirically shown to be effective via extensive experimentation on a number of known and newly introduced problems with hidden state, and the results are discussed.

中文翻译：

部分可观察性下学习代理的自动地标发现

在强化学习上下文中，地标是一个紧凑的信息，它唯一地耦合一个状态，用于隐藏状态的问题。地标显示支持为包含至少一个地标的部分可观察马尔可夫决策过程 (POMDP) 找到良好的无记忆策略。SarsaLandmark，作为 Sarsa(λ) 的改编版本，在假设问题的所有界标都预先知道的情况下，可以保证更好的学习性能。在本文中，我们提出了一个基于 SarsaLandmark 的框架，该框架能够在学习期间自动识别问题中的界标，而不会牺牲质量，并且不需要有关问题结构的先验信息。为此，该框架将 SarsaLandmark 与著名的多实例学习算法，即多样化密度 (DD) 融合在一起。通过进一步的实验，我们还提供了对加速 DD 的概念过滤启发式算法的更深入了解，缩写为 DDCF（Diverse Density with Concept Filtering），证明自己适用于具有界标的 POMDP。DDCF 在计算速度和解决方案质量方面优于其先行者，但不失一般性。通过对许多已知和新引入的隐藏状态问题的广泛实验，这些方法被经验证明是有效的，并对结果进行了讨论。DDCF 在计算速度和解决方案质量方面优于其先行者，但不失一般性。通过对许多已知和新引入的隐藏状态问题的广泛实验，这些方法被经验证明是有效的，并对结果进行了讨论。DDCF 在计算速度和解决方案质量方面优于其先行者，但不失一般性。通过对许多已知和新引入的隐藏状态问题的广泛实验，这些方法被经验证明是有效的，并对结果进行了讨论。

更新日期：2019-08-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>