当前位置: X-MOL 学术arXiv.cs.RO › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Practical Cross-modal Manifold Alignment for Grounded Language
arXiv - CS - Robotics Pub Date : 2020-09-01 , DOI: arxiv-2009.05147
Andre T. Nguyen, Luke E. Richards, Gaoussou Youssouf Kebe, Edward Raff, Kasra Darvish, Frank Ferraro, Cynthia Matuszek

We propose a cross-modality manifold alignment procedure that leverages triplet loss to jointly learn consistent, multi-modal embeddings of language-based concepts of real-world items. Our approach learns these embeddings by sampling triples of anchor, positive, and negative data points from RGB-depth images and their natural language descriptions. We show that our approach can benefit from, but does not require, post-processing steps such as Procrustes analysis, in contrast to some of our baselines which require it for reasonable performance. We demonstrate the effectiveness of our approach on two datasets commonly used to develop robotic-based grounded language learning systems, where our approach outperforms four baselines, including a state-of-the-art approach, across five evaluation metrics.

中文翻译:

扎根语言的实用跨模态流形对齐

我们提出了一种跨模态流形对齐程序,它利用三元组损失来共同学习现实世界项目的基于语言的概念的一致、多模态嵌入。我们的方法通过从 RGB 深度图像及其自然语言描述中采样锚点、正数据点和负数据点的三元组来学习这些嵌入。我们表明,我们的方法可以从 Procrustes 分析等后处理步骤中受益,但不需要,这与我们的一些基线相比,需要它以获得合理的性能。我们在通常用于开发基于机器人的基础语言学习系统的两个数据集上证明了我们的方法的有效性,其中我们的方法在五个评估指标上优于四个基线,包括最先进的方法。
更新日期:2020-09-14
down
wechat
bug