Multimodal information bottleneck for deep reinforcement learning with multiple sensors,Neural Networks

当前位置： X-MOL 学术 › Neural Netw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multimodal information bottleneck for deep reinforcement learning with multiple sensors
Neural Networks ( IF 7.8 ) Pub Date : 2024-04-27 , DOI: 10.1016/j.neunet.2024.106347
Bang You , Huaping Liu

Reinforcement learning has achieved promising results on robotic control tasks but struggles to leverage information effectively from multiple sensory modalities that differ in many characteristics. Recent works construct auxiliary losses based on reconstruction or mutual information to extract joint representations from multiple sensory inputs to improve the sample efficiency and performance of reinforcement learning algorithms. However, the representations learned by these methods could capture information irrelevant to learning a policy and may degrade the performance. We argue that compressing information in the learned joint representations about raw multimodal observations is helpful, and propose a multimodal information bottleneck model to learn task-relevant joint representations from egocentric images and proprioception. Our model compresses and retains the predictive information in multimodal observations for learning a compressed joint representation, which fuses complementary information from visual and proprioceptive feedback and meanwhile filters out task-irrelevant information in raw multimodal observations. We propose to minimize the upper bound of our multimodal information bottleneck objective for computationally tractable optimization. Experimental evaluations on several challenging locomotion tasks with egocentric images and proprioception show that our method achieves better sample efficiency and zero-shot robustness to unseen white noise than leading baselines. We also empirically demonstrate that leveraging information from egocentric images and proprioception is more helpful for learning policies on locomotion tasks than solely using one single modality.

中文翻译：

多传感器深度强化学习的多模态信息瓶颈

强化学习在机器人控制任务上取得了有希望的结果，但难以有效地利用来自多种特征不同的多种感官模式的信息。最近的工作基于重建或互信息构建辅助损失，以从多个感官输入中提取联合表示，以提高强化学习算法的样本效率和性能。然而，通过这些方法学习的表示可能会捕获与学习策略无关的信息，并可能降低性能。我们认为压缩关于原始多模态观察的学习联合表示中的信息是有帮助的，并提出了一种多模态信息瓶颈模型来从自我中心图像和本体感觉中学习与任务相关的联合表示。我们的模型压缩并保留多模态观察中的预测信息，以学习压缩的联合表示，它融合了来自视觉和本体感觉反馈的补充信息，同时过滤掉原始多模态观察中与任务无关的信息。我们建议最小化多模态信息瓶颈目标的上限，以实现计算上易于处理的优化。对具有自我中心图像和本体感觉的几个具有挑战性的运动任务的实验评估表明，与领先基线相比，我们的方法实现了更好的样本效率和对不可见白噪声的零样本鲁棒性。我们还凭经验证明，利用来自自我中心图像和本体感觉的信息比仅使用一种单一模式更有助于学习运动任务的策略。

更新日期：2024-04-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>