当前位置: X-MOL 学术arXiv.cs.RO › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Desk Organization: Effect of Multimodal Inputs on Spatial Relational Learning
arXiv - CS - Robotics Pub Date : 2021-08-03 , DOI: arxiv-2108.01254
Ryan Rowe, Shivam Singhal, Daqing Yi, Tapomayukh Bhattacharjee, Siddhartha S. Srinivasa

For robots to operate in a three dimensional world and interact with humans, learning spatial relationships among objects in the surrounding is necessary. Reasoning about the state of the world requires inputs from many different sensory modalities including vision ($V$) and haptics ($H$). We examine the problem of desk organization: learning how humans spatially position different objects on a planar surface according to organizational ''preference''. We model this problem by examining how humans position objects given multiple features received from vision and haptic modalities. However, organizational habits vary greatly between people both in structure and adherence. To deal with user organizational preferences, we add an additional modality, ''utility'' ($U$), which informs on a particular human's perceived usefulness of a given object. Models were trained as generalized (over many different people) or tailored (per person). We use two types of models: random forests, which focus on precise multi-task classification, and Markov logic networks, which provide an easily interpretable insight into organizational habits. The models were applied to both synthetic data, which proved to be learnable when using fixed organizational constraints, and human-study data, on which the random forest achieved over 90% accuracy. Over all combinations of $\{H, U, V\}$ modalities, $UV$ and $HUV$ were the most informative for organization. In a follow-up study, we gauged participants preference of desk organizations by a generalized random forest organization vs. by a random model. On average, participants rated the random forest models as 4.15 on a 5-point Likert scale compared to 1.84 for the random model

中文翻译:

桌面组织:多模态输入对空间关系学习的影响

机器人要在三维世界中运行并与人类互动,学习周围物体之间的空间关系是必要的。关于世界状态的推理需要来自许多不同感官模式的输入,包括视觉 ($V$) 和触觉 ($H$)。我们研究了办公桌组织的问题:学习人类如何根据组织的“偏好”在平面上空间定位不同的对象。我们通过检查人类如何定位从视觉和触觉模式接收到的多个特征的对象来模拟这个问题。然而,人与人之间的组织习惯在结构和依从性方面差异很大。为了处理用户组织偏好,我们添加了一个额外的模式,“实用程序”($U$),它通知特定的人“ s 给定对象的感知有用性。模型被训练为通用的(针对许多不同的人)或定制的(每个人)。我们使用两种类型的模型:随机森林,专注于精确的多任务分类,以及马尔可夫逻辑网络,它提供对组织习惯的易于解释的洞察力。这些模型被应用于合成数据(在使用固定的组织约束时证明是可学习的)和人类研究数据(随机森林的准确率超过 90%)。在 $\{H, U, V\}$ 模态的所有组合中,$UV$ 和 $HUV$ 对组织的信息量最大。在后续研究中,我们通过广义随机森林组织与随机模型来衡量参与者对办公桌组织的偏好。平均而言,参与者将随机森林模型评为 4。
更新日期:2021-08-04
down
wechat
bug