当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
OtoWorld: Towards Learning to Separate by Learning to Move
arXiv - CS - Sound Pub Date : 2020-07-12 , DOI: arxiv-2007.06123
Omkar Ranadive, Grant Gasser, David Terpay, Prem Seetharaman

We present OtoWorld, an interactive environment in which agents must learn to listen in order to solve navigational tasks. The purpose of OtoWorld is to facilitate reinforcement learning research in computer audition, where agents must learn to listen to the world around them to navigate. OtoWorld is built on three open source libraries: OpenAI Gym for environment and agent interaction, PyRoomAcoustics for ray-tracing and acoustics simulation, and nussl for training deep computer audition models. OtoWorld is the audio analogue of GridWorld, a simple navigation game. OtoWorld can be easily extended to more complex environments and games. To solve one episode of OtoWorld, an agent must move towards each sounding source in the auditory scene and "turn it off". The agent receives no other input than the current sound of the room. The sources are placed randomly within the room and can vary in number. The agent receives a reward for turning off a source. We present preliminary results on the ability of agents to win at OtoWorld. OtoWorld is open-source and available.



我们展示了 OtoWorld,这是一个交互式环境,在该环境中,代理必须学会倾听才能解决导航任务。OtoWorld 的目的是促进计算机试听中的强化学习研究,其中代理必须学会倾听周围的世界以进行导航。OtoWorld 建立在三个开源库之上:用于环境和代理交互的 OpenAI Gym、用于光线追踪和声学模拟的 PyRoomAcoustics,以及用于训练深度计算机试听模型的 nussl。OtoWorld 是 GridWorld 的音频模拟,GridWorld 是一款简单的导航游戏。OtoWorld 可以轻松扩展到更复杂的环境和游戏。要解决 OtoWorld 的一集,代理必须向听觉场景中的每个发声源移动并“将其关闭”。除了房间的当前声音外,代理不接收其他输入。源随机放置在房间内,数量可能会有所不同。代理因关闭源而获得奖励。我们展示了代理在 OtoWorld 上获胜的能力的初步结果。OtoWorld 是开源的并且可用。