RGB2LIDAR：求解大规模的跨模式视觉定位

论文标题

RGB2LIDAR：求解大规模的跨模式视觉定位

RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization

论文作者

Mithun, Niluthpol Chowdhury, Sikka, Karan, Chiu, Han-Pang, Samarasekera, Supun, Kumar, Rakesh

论文摘要

我们通过将地面RGB图像与地理引用的空中激光雷达3D点云（以深度图像呈现）相匹配，研究了一个重要但未开发的大规模跨模式视觉定位问题。在小型数据集上展示了先前的作品，并且不适合扩大大规模应用程序。为了实现大规模评估，我们引入了一个新的数据集，其中包含RGB和空中激光雷德图像的550k对（覆盖143 km^2区域）。我们提出了一种基于关节的新方法，该方法有效地结合了两种方式的外观和语义线索，以处理急剧的交叉模式变化。拟议数据集的实验表明，我们的模型在从14 km^2区域收集的大型50K位置对的大型测试集中匹配的中值排名中位数为5。这代表了对绩效和规模的先前工作的重大进步。我们以定性的结果结束，以强调该任务的挑战性质以及所提出的模型的好处。我们的工作为进一步研究跨模式视觉定位提供了基础。

We study an important, yet largely unexplored problem of large-scale cross-modal visual localization by matching ground RGB images to a geo-referenced aerial LIDAR 3D point cloud (rendered as depth images). Prior works were demonstrated on small datasets and did not lend themselves to scaling up for large-scale applications. To enable large-scale evaluation, we introduce a new dataset containing over 550K pairs (covering 143 km^2 area) of RGB and aerial LIDAR depth images. We propose a novel joint embedding based method that effectively combines the appearance and semantic cues from both modalities to handle drastic cross-modal variations. Experiments on the proposed dataset show that our model achieves a strong result of a median rank of 5 in matching across a large test set of 50K location pairs collected from a 14km^2 area. This represents a significant advancement over prior works in performance and scale. We conclude with qualitative results to highlight the challenging nature of this task and the benefits of the proposed model. Our work provides a foundation for further research in cross-modal visual localization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题