当前位置: X-MOL 学术J. Aerosp. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Human-Aware Reinforcement Learning for Fault Recovery Using Contextual Gaussian Processes
Journal of Aerospace Information Systems ( IF 1.5 ) Pub Date : 2021-05-21 , DOI: 10.2514/1.i010921
Steve McGuire 1 , P. Michael Furlong 2 , Christoffer Heckman 3 , Simon Julier 4 , Nisar Ahmed 5
Affiliation  

This work addresses the iterated nonstationary assistant selection problem, in which over the course of repeated interactions on a mission, an autonomous robot experiencing a fault must select a single human from among a group of assistants to restore it to operation. The assistants in our problem have a level of performance that changes as a function of their experience solving the problem. Our approach uses reinforcement learning via a multi-arm bandit formulation to learn about the capabilities of each potential human assistant and decide which human to task. This study, which is built on our past work, evaluates the potential for a Gaussian-process-based machine learning method to effectively model the complex dynamics associated with human learning and forgetting. Application of our method in simulation shows that our method is capable of tracking performance of human-like dynamics for learning and forgetting. Using a novel selection policy called the proficiency window, it is shown that our technique can outperform baseline selection strategies while providing guarantees on human use. Our work offers an effective potential alternative to dedicated human supervisors, with application to any human–robot system where a set of humans is responsible for overseeing autonomous robot operations.



中文翻译:

使用上下文高斯过程进行故障恢复的人员感知增强学习

这项工作解决了反复出现的非平稳助理选择问题,在该问题中,在重复执行任务的过程中,遇到故障的自主机器人必须从一组助理中选择一个人才能使其恢复正常运行。解决问题的助手的绩效水平会根据他们解决问题的经验而变化。我们的方法通过多臂匪徒公式使用强化学习来了解每个潜在人类助手的能力并决定要指派哪个人。基于我们过去的工作的这项研究评估了基于高斯过程的机器学习方法有效地建模与人类学习和遗忘相关的复杂动力学的潜力。该方法在仿真中的应用表明,该方法能够跟踪类人动力学的性能,以进行学习和遗忘。使用一种称为“熟练程度”窗口的新颖选择策略,可以证明我们的技术在提供对人类使用的保证的同时,可以胜过基线选择策略。我们的工作为专门的人类监督者提供了一种有效的潜在替代方案,适用于任何由人类负责监督自主机器人操作的人类机器人系统。

更新日期:2021-05-22
down
wechat
bug