相机姿势自动编码器，用于改善姿势回归

论文标题

相机姿势自动编码器，用于改善姿势回归

Camera Pose Auto-Encoders for Improving Pose Regression

论文作者

Shavit, Yoli, Keller, Yosi

论文摘要

对绝对姿势回归器（APR）网络进行了训练，以估算给定捕获图像的相机姿势。他们计算了摄像机位置和方向回归的潜在图像表示。与提供最新精度的基于结构的本地化方案相比，APRS在本地化准确性，运行时和内存之间提供了不同的权衡。在这项工作中，我们介绍了相机姿势自动编码器（PAES），多层感知器通过教师学生的方法进行培训，以用APR作为老师来编码相机姿势。我们表明，由此产生的潜在姿势表示可以密切复制APR性能，并证明其对相关任务的有效性。具体而言，我们提出了一个轻量级测试时间优化，其中最接近的火车姿势编码并用于完善相机位置估计。该过程在剑桥大标记和7Scenes基准上都达到了APRS的新最新位置精度。我们还表明，可以从学习的姿势编码中重建火车图像，为以低内存成本以低内存成本整合火车的视觉信息铺平了道路。我们的代码和预培训模型可在https://github.com/yolish/camera-pose-auto-soders上找到。

Absolute pose regressor (APR) networks are trained to estimate the pose of the camera given a captured image. They compute latent image representations from which the camera position and orientation are regressed. APRs provide a different tradeoff between localization accuracy, runtime, and memory, compared to structure-based localization schemes that provide state-of-the-art accuracy. In this work, we introduce Camera Pose Auto-Encoders (PAEs), multilayer perceptrons that are trained via a Teacher-Student approach to encode camera poses using APRs as their teachers. We show that the resulting latent pose representations can closely reproduce APR performance and demonstrate their effectiveness for related tasks. Specifically, we propose a light-weight test-time optimization in which the closest train poses are encoded and used to refine camera position estimation. This procedure achieves a new state-of-the-art position accuracy for APRs, on both the CambridgeLandmarks and 7Scenes benchmarks. We also show that train images can be reconstructed from the learned pose encoding, paving the way for integrating visual information from the train set at a low memory cost. Our code and pre-trained models are available at https://github.com/yolish/camera-pose-auto-encoders.

下载PDF全文

下载文献需遵守相关版权规定

论文标题