预训练，自我培训，蒸馏：一个简单的3D重建的配方

论文标题

预训练，自我培训，蒸馏：一个简单的3D重建的配方

Pre-train, Self-train, Distill: A simple recipe for Supersizing 3D Reconstruction

论文作者

Alwala, Kalyan Vasudev, Gupta, Abhinav, Tulsiani, Shubham

论文摘要

我们的工作学习了一个统一的模型，用于从数百个语义类别中重建对象的单视图3D。作为直接3D监督的可扩展替代方法，我们的工作依赖于学习3D的分段图像集合。与使用类似监督但从头开始学习独立类别的模型的先前作品不同，我们学习统一模型的方法简化了训练过程，同时允许该模型从类别跨类别的共同结构中受益。使用来自标准识别数据集中的图像收集，我们表明我们的方法允许学习150多个对象类别的3D推断。我们使用两个数据集进行评估，并在定性上和定量上表明，我们的统一重建方法改善了先前类别的重建基线。我们的最终3D重建模型还能够对来自看不见的对象类别的图像进行零射击推断，我们从经验上表明，增加培训类别的数量可以提高重建质量。

Our work learns a unified model for single-view 3D reconstruction of objects from hundreds of semantic categories. As a scalable alternative to direct 3D supervision, our work relies on segmented image collections for learning 3D of generic categories. Unlike prior works that use similar supervision but learn independent category-specific models from scratch, our approach of learning a unified model simplifies the training process while also allowing the model to benefit from the common structure across categories. Using image collections from standard recognition datasets, we show that our approach allows learning 3D inference for over 150 object categories. We evaluate using two datasets and qualitatively and quantitatively show that our unified reconstruction approach improves over prior category-specific reconstruction baselines. Our final 3D reconstruction model is also capable of zero-shot inference on images from unseen object categories and we empirically show that increasing the number of training categories improves the reconstruction quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题