论文标题
EPRO-PNP:单眼对象估计
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation
论文作者
论文摘要
通过Perspective-n-Points(PNP)从单个RGB图像找到3D对象是计算机视觉中的长期问题。在端到端深度学习的驱动下,最近的研究表明,将PNP解释为一个可区分的层,因此可以通过反向传播梯度W.R.T.可以部分学习2d-3d点对应。对象姿势。然而,由于确定性姿势本质上是非差异的,因此从头开始学习整个不受限制的2D-3D点无法与现有方法融合。在本文中,我们提出了用于一般端到端姿势估计的Epro-PNP,这是一种概率的PNP层,它在SE(3)歧管上输出姿势的分布,实质上是将分类软效果带到连续域。 2d-3d坐标和相应的权重被视为通过最大程度地减少预测姿势分布和目标姿势分布之间的KL差异来学习的中间变量。基本原则统一了现有方法,并类似于注意机制。 EPRO-PNP显着胜过竞争基准,缩小基于PNP的方法与lineMod 6DOF姿势估计和Nuscenes 3D对象检测基准的特定于任务的领导者之间的差距。
Locating 3D objects from a single RGB image via Perspective-n-Points (PnP) is a long-standing problem in computer vision. Driven by end-to-end deep learning, recent studies suggest interpreting PnP as a differentiable layer, so that 2D-3D point correspondences can be partly learned by backpropagating the gradient w.r.t. object pose. Yet, learning the entire set of unrestricted 2D-3D points from scratch fails to converge with existing approaches, since the deterministic pose is inherently non-differentiable. In this paper, we propose the EPro-PnP, a probabilistic PnP layer for general end-to-end pose estimation, which outputs a distribution of pose on the SE(3) manifold, essentially bringing categorical Softmax to the continuous domain. The 2D-3D coordinates and corresponding weights are treated as intermediate variables learned by minimizing the KL divergence between the predicted and target pose distribution. The underlying principle unifies the existing approaches and resembles the attention mechanism. EPro-PnP significantly outperforms competitive baselines, closing the gap between PnP-based method and the task-specific leaders on the LineMOD 6DoF pose estimation and nuScenes 3D object detection benchmarks.