VID2CAD：使用视频中的多视图约束的CAD模型对齐

论文标题

VID2CAD：使用视频中的多视图约束的CAD模型对齐

Vid2CAD: CAD Model Alignment using Multi-View Constraints from Videos

论文作者

Maninis, Kevis-Kokitsi, Popov, Stefan, Nießner, Matthias, Ferrari, Vittorio

论文摘要

我们解决将CAD模型与包含多个对象的复杂场景的视频序列对齐的任务。我们的方法可以处理任意视频，并完全自动恢复出现在其中的每个对象的9 DOF姿势，从而将它们对齐在常见的3D坐标帧中。我们方法的核心思想是将单个帧的神经网络预测与时间全局，多视图约束优化公式集成。这种整合过程可以在人均预测中解决规模和深度歧义，并通常改善所有姿势参数的估计值。通过利用多视图约束，我们的方法还可以解决遮挡并处理单个帧中视图的对象，从而将所有对象重构为场景的单个全球一致的CAD表示。与我们构建的最新单帧方法掩码2加上相比，我们在SCAN2CAD数据集（从11.6％到30.7％的班级平均准确度）上实现了实质性改进。

We address the task of aligning CAD models to a video sequence of a complex scene containing multiple objects. Our method can process arbitrary videos and fully automatically recover the 9 DoF pose for each object appearing in it, thus aligning them in a common 3D coordinate frame. The core idea of our method is to integrate neural network predictions from individual frames with a temporally global, multi-view constraint optimization formulation. This integration process resolves the scale and depth ambiguities in the per-frame predictions, and generally improves the estimate of all pose parameters. By leveraging multi-view constraints, our method also resolves occlusions and handles objects that are out of view in individual frames, thus reconstructing all objects into a single globally consistent CAD representation of the scene. In comparison to the state-of-the-art single-frame method Mask2CAD that we build on, we achieve substantial improvements on the Scan2CAD dataset (from 11.6% to 30.7% class average accuracy).

下载PDF全文

下载文献需遵守相关版权规定

论文标题