总计3毫无意义：来自单个图像的室内场景的联合布局，对象姿势和网状重建

论文标题

总计3毫无意义：来自单个图像的室内场景的联合布局，对象姿势和网状重建

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

论文作者

Nie, Yinyu, Han, Xiaoguang, Guo, Shihui, Zheng, Yujian, Chang, Jian, Zhang, Jian Jun

论文摘要

室内场景的语义重建是指场景理解和对象重建。现有作品要么解决此问题的一部分，要么关注独立对象。在本文中，我们弥合了理解和重建之间的差距，并提出了一个端到端的解决方案，以共同重建房间布局，对象边界框和来自单个图像的网格。我们的方法不是单独解决场景理解和对象重建，而是在整体场景上下文中构建，并提出了带有三个组件的粗到细层次结构：1。带有相机姿势的房间布局； 2。3D对象边界框； 3。对象网格。我们认为，了解每个组件的上下文可以帮助解析其他组件的任务，从而可以共同理解和重建。 Sun RGB-D和Pix3D数据集上的实验表明，我们的方法在室内布局估计，3D对象检测和网格重建中始终优于现有方法。

Semantic reconstruction of indoor scenes refers to both scene understanding and object reconstruction. Existing works either address one part of this problem or focus on independent objects. In this paper, we bridge the gap between understanding and reconstruction, and propose an end-to-end solution to jointly reconstruct room layout, object bounding boxes and meshes from a single image. Instead of separately resolving scene understanding and object reconstruction, our method builds upon a holistic scene context and proposes a coarse-to-fine hierarchy with three components: 1. room layout with camera pose; 2. 3D object bounding boxes; 3. object meshes. We argue that understanding the context of each component can assist the task of parsing the others, which enables joint understanding and reconstruction. The experiments on the SUN RGB-D and Pix3D datasets demonstrate that our method consistently outperforms existing methods in indoor layout estimation, 3D object detection and mesh reconstruction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题