diffposenet：直接可区分的相机姿势估计

论文标题

diffposenet：直接可区分的相机姿势估计

DiffPoseNet: Direct Differentiable Camera Pose Estimation

论文作者

Parameshwara, Chethan M., Hari, Gokul, Fermüller, Cornelia, Sanket, Nitin J., Aloimonos, Yiannis

论文摘要

当前的摄像头姿势估计的深度神经网络方法取决于场景结构进行3D运动估计，但这会降低稳健性，从而使交叉数据集泛化变得困难。相比之下，利用光流并计算深度的运动估计3D运动的经典结构方法。但是，它们的准确性在很大程度上取决于光流的质量。为了避免此问题，已经提出了直接方法，该方法将3D运动与深度估计分开，但仅使用正常流动形式的图像梯度计算3D运动。在本文中，我们引入了一个网络NFLOWNET，以进行正常的流量估计，用于强大和直接约束。特别是，正常流程用于根据欢笑（深度阳性）约束来估计相对摄像头姿势。我们通过将优化问题作为一个可区分的欢笑层来实现这一目标，从而可以端到端学习相机姿势。我们对所提出的diffposenet对噪声及其跨数据集的概括进行广泛的定性和定量评估。我们比较了对Kitti，Tartanair和Tum-RGBD数据集的现有最新方法的方法。

Current deep neural network approaches for camera pose estimation rely on scene structure for 3D motion estimation, but this decreases the robustness and thereby makes cross-dataset generalization difficult. In contrast, classical approaches to structure from motion estimate 3D motion utilizing optical flow and then compute depth. Their accuracy, however, depends strongly on the quality of the optical flow. To avoid this issue, direct methods have been proposed, which separate 3D motion from depth estimation but compute 3D motion using only image gradients in the form of normal flow. In this paper, we introduce a network NFlowNet, for normal flow estimation which is used to enforce robust and direct constraints. In particular, normal flow is used to estimate relative camera pose based on the cheirality (depth positivity) constraint. We achieve this by formulating the optimization problem as a differentiable cheirality layer, which allows for end-to-end learning of camera pose. We perform extensive qualitative and quantitative evaluation of the proposed DiffPoseNet's sensitivity to noise and its generalization across datasets. We compare our approach to existing state-of-the-art methods on KITTI, TartanAir, and TUM-RGBD datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题