通过学习解开几何布局对应关系的跨视图地理位置定位

论文标题

通过学习解开几何布局对应关系的跨视图地理位置定位

Cross-view Geo-localization via Learning Disentangled Geometric Layout Correspondence

论文作者

Zhang, Xiaohan, Li, Xingyu, Sultani, Waqas, Zhou, Yi, Wshah, Safwan

论文摘要

跨视图地理位置定位旨在通过将其与参考地理标记的空中图像数据库相匹配来估计查询地面图像的位置。作为一项极具挑战性的任务，它的困难根源在剧烈的视图中变化，并且在两个视图之间捕获时间不同。尽管遇到了这些困难，但最近的作品在跨视图地理位置基准测试基准方面取得了出色的进步。但是，现有的方法在跨区域基准测试上仍然遭受较差的性能，其中训练和测试数据是从两个不同地区捕获的。我们将这种缺陷归因于缺乏提取视觉特征布局的空间配置和模型对训练集的低级细节的过拟合的能力。在本文中，我们提出了Geodtr，该Geodtr明确地将几何信息从原始特征中删除，并通过新颖的几何布局提取器模块从空中和地面对中了解视觉特征之间的空间相关性。该模块生成一组几何布局描述符，调节原始功能并产生高质量的潜在表示。此外，我们详细介绍了两类数据增强，（i）布局仿真，它们会在空间配置中变化，同时保持低级细节完整。（ii）语义增强，它改变了低级细节并鼓励模型捕获空间配置。这些增强有助于提高跨视野地理位置模型的性能，尤其是在跨区域基准上。此外，我们提出了一个基于反事实的学习过程，以使几何布局提取器在探索空间信息时受益。广泛的实验表明，GeoDTR不仅取得了最新的结果，而且还可以显着提高相同区域和跨区域基准的性能。

Cross-view geo-localization aims to estimate the location of a query ground image by matching it to a reference geo-tagged aerial images database. As an extremely challenging task, its difficulties root in the drastic view changes and different capturing time between two views. Despite these difficulties, recent works achieve outstanding progress on cross-view geo-localization benchmarks. However, existing methods still suffer from poor performance on the cross-area benchmarks, in which the training and testing data are captured from two different regions. We attribute this deficiency to the lack of ability to extract the spatial configuration of visual feature layouts and models' overfitting on low-level details from the training set. In this paper, we propose GeoDTR which explicitly disentangles geometric information from raw features and learns the spatial correlations among visual features from aerial and ground pairs with a novel geometric layout extractor module. This module generates a set of geometric layout descriptors, modulating the raw features and producing high-quality latent representations. In addition, we elaborate on two categories of data augmentations, (i) Layout simulation, which varies the spatial configuration while keeping the low-level details intact. (ii) Semantic augmentation, which alters the low-level details and encourages the model to capture spatial configurations. These augmentations help to improve the performance of the cross-view geo-localization models, especially on the cross-area benchmarks. Moreover, we propose a counterfactual-based learning process to benefit the geometric layout extractor in exploring spatial information. Extensive experiments show that GeoDTR not only achieves state-of-the-art results but also significantly boosts the performance on same-area and cross-area benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题