当前位置: X-MOL 学术Auton. Agent. Multi-Agent Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Crossmodal attentive skill learner: learning in Atari and beyond with audio–video inputs
Autonomous Agents and Multi-Agent Systems ( IF 1.9 ) Pub Date : 2020-01-13 , DOI: 10.1007/s10458-019-09439-5
Dong-Ki Kim , Shayegan Omidshafiei , Jason Pazis , Jonathan P. How

This paper introduces the Crossmodal Attentive Skill Learner (CASL), integrated with the recently-introduced Asynchronous Advantage Option-Critic architecture [Harb et al. in When waiting is not an option: learning options with a deliberation cost. arXiv preprint arXiv:1709.04571, 2017] to enable hierarchical reinforcement learning across multiple sensory inputs. Agents trained using our approach learn to attend to their various sensory modalities (e.g., audio, video) at the appropriate moments, thereby executing actions based on multiple sensory streams without reliance on supervisory data. We demonstrate empirically that the sensory attention mechanism anticipates and identifies useful latent features, while filtering irrelevant sensor modalities during execution. Further, we provide concrete examples in which the approach not only improves performance in a single task, but accelerates transfer to new tasks. We modify the Arcade Learning Environment [Bellemare et al. in J Artif Intell Res 47:253–279, 2013] to support audio queries (ALE-audio code available at https://github.com/shayegano/Arcade-Learning-Environment), and conduct evaluations of crossmodal learning in the Atari 2600 games H.E.R.O. and Amidar. Finally, building on the recent work of Babaeizadeh et al. [in: International conference on learning representations (ICLR), 2017], we open-source a fast hybrid CPU–GPU implementation of CASL (CASL code available at https://github.com/shayegano/CASL).

中文翻译:

跨模式专心技能学习者:通过音频-视频输入在Atari及其他地区学习

本文介绍了“交叉模式专注学习者”(CASL),它与最近引入的“异步优势选择-关键”体系结构集成在一起[Harb等。“等待时”中的选项不是:学习成本高的选项。arXiv预印本arXiv:1709.04571,2017年],可实现跨多个感官输入的分层强化学习。使用我们的方法培训过的特工学会在适当的时候关注他们的各种感觉方式(例如,音频,视频),从而在不依赖监督数据的情况下基于多个感觉流执行动作。我们凭经验证明,感官注意力机制可以预测并识别有用的潜在特征,同时在执行过程中过滤无关的传感器模式。进一步,我们提供了具体的示例,其中该方法不仅可以提高单个任务的性能,而且可以加快向新任务的转移。我们修改了Arcade学习环境[Bellemare等。(J Artif Intell Res 47:253–279,2013年)以支持音频查询(ALE音频代码可从https://github.com/shayegano/Arcade-Learning-Environment获得),并在Atari中进行交叉模式学习的评估2600游戏英雄和阿米达。最后,以Babaeizadeh等人的最新工作为基础。[在:国际学习表示法会议(ICLR),2017年]中,我们开放了CASL的快速混合CPU-GPU实现的源代码(可在https://github.com/shayegano/CASL上找到CASL代码)。2013]支持音频查询(可在https://github.com/shayegano/Arcade-Learning-Environment获得ALE音频代码),并在Atari 2600游戏HERO和Amidar中进行交叉模式学习的评估。最后,以Babaeizadeh等人的最新工作为基础。[在:国际学习表示法会议(ICLR),2017年]中,我们开放了CASL的快速混合CPU-GPU实现的源代码(可在https://github.com/shayegano/CASL上找到CASL代码)。2013]支持音频查询(可在https://github.com/shayegano/Arcade-Learning-Environment获得ALE音频代码),并在Atari 2600游戏HERO和Amidar中进行交叉模式学习的评估。最后,以Babaeizadeh等人的最新工作为基础。[在:国际学习表示法会议(ICLR),2017年]中,我们开放了CASL的快速混合CPU-GPU实现的源代码(可在https://github.com/shayegano/CASL上找到CASL代码)。
更新日期:2020-01-13
down
wechat
bug