当前位置: X-MOL 学术arXiv.cs.MM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization
arXiv - CS - Multimedia Pub Date : 2020-09-12 , DOI: arxiv-2009.05695
Niluthpol Chowdhury Mithun, Karan Sikka, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar

We study an important, yet largely unexplored problem of large-scale cross-modal visual localization by matching ground RGB images to a geo-referenced aerial LIDAR 3D point cloud (rendered as depth images). Prior works were demonstrated on small datasets and did not lend themselves to scaling up for large-scale applications. To enable large-scale evaluation, we introduce a new dataset containing over 550K pairs (covering 143 km^2 area) of RGB and aerial LIDAR depth images. We propose a novel joint embedding based method that effectively combines the appearance and semantic cues from both modalities to handle drastic cross-modal variations. Experiments on the proposed dataset show that our model achieves a strong result of a median rank of 5 in matching across a large test set of 50K location pairs collected from a 14km^2 area. This represents a significant advancement over prior works in performance and scale. We conclude with qualitative results to highlight the challenging nature of this task and the benefits of the proposed model. Our work provides a foundation for further research in cross-modal visual localization.

中文翻译:

RGB2LIDAR:解决大规模跨模态视觉定位

我们通过将地面 RGB 图像与地理参考航空 LIDAR 3D 点云(呈现为深度图像)匹配来研究大规模跨模式视觉定位的一个重要但尚未探索的问题。先前的工作是在小型数据集上进行的,并且不适用于大规模应用程序的扩展。为了实现大规模评估,我们引入了一个包含超过 550K 对(覆盖 143 km^2 区域)的 RGB 和航空 LIDAR 深度图像的新数据集。我们提出了一种新的基于联合嵌入的方法,该方法有效地结合了两种模态的外观和语义线索,以处理剧烈的跨模态变化。在提议的数据集上的实验表明,我们的模型在从 14km^2 区域收集的 50K 个位置对的大型测试集的匹配中获得了中位数排名 5 的强大结果。这代表了在性能和规模方面比先前工作的重大进步。我们以定性结果结束,以强调这项任务的挑战性和所提出模型的好处。我们的工作为进一步研究跨模式视觉定位提供了基础。
更新日期:2020-09-15
down
wechat
bug