论文标题
Meshloc:基于网格的视觉本地化
MeshLoc: Mesh-Based Visual Localization
论文作者
论文摘要
视觉定位,即相机姿势估计的问题,是应用程序和增强现实系统等应用的核心组成部分。文献中的主要方法是基于从图像中提取的局部特征来扩展到大型场景并处理复杂的照明和季节性变化。场景表示形式是稀疏的结构,限制了与特定本地特征相关的。切换到另一种功能类型需要在用于构造点云的数据库图像之间昂贵的功能匹配步骤。在这项工作中,我们基于密集的3D网格探索了一个更灵活的替代方案,该替代方案不需要数据库图像之间的功能来构建场景表示。我们表明,这种方法可以实现最新的结果。我们进一步表明,当在没有任何神经渲染阶段的渲染效果上提取功能时,即使在没有颜色或纹理的无原始场景几何形状时,也可以获得令人惊讶的竞争结果。我们的结果表明,基于3D模型的密集表示是现有表示形式的有前途的替代方案,并指出了未来研究的有趣且具有挑战性的方向。
Visual localization, i.e., the problem of camera pose estimation, is a central component of applications such as autonomous robots and augmented reality systems. A dominant approach in the literature, shown to scale to large scenes and to handle complex illumination and seasonal changes, is based on local features extracted from images. The scene representation is a sparse Structure-from-Motion point cloud that is tied to a specific local feature. Switching to another feature type requires an expensive feature matching step between the database images used to construct the point cloud. In this work, we thus explore a more flexible alternative based on dense 3D meshes that does not require features matching between database images to build the scene representation. We show that this approach can achieve state-of-the-art results. We further show that surprisingly competitive results can be obtained when extracting features on renderings of these meshes, without any neural rendering stage, and even when rendering raw scene geometry without color or texture. Our results show that dense 3D model-based representations are a promising alternative to existing representations and point to interesting and challenging directions for future research.