简单有效的室内3D场景合成

论文标题

简单有效的室内3D场景合成

Simple and Effective Synthesis of Indoor 3D Scenes

论文作者

Koh, Jing Yu, Agrawal, Harsh, Batra, Dhruv, Tucker, Richard, Waters, Austin, Lee, Honglak, Yang, Yinfei, Baldridge, Jason, Anderson, Peter

论文摘要

我们研究从一个或多个图像中合成沉浸式3D室内场景的问题。我们的目的是从新颖的角度生成高分辨率的图像和视频，包括在保持3D一致性的同时推断出远远超出输入图像的观点。现有方法非常复杂，具有许多单独训练的阶段和组件。我们提出了一个简单的替代方法：一个图像到图像gan，该gan直接映射从不完整点云的再投影到完整的高分辨率RGB-D图像。在MatterPort3D和realestate10k数据集上，我们的方法在由人类和FID分数评估时明显优于先前的工作。此外，我们表明我们的模型对于生成数据增强很有用。在R2R基准上，通过我们模型空间扰动的轨迹训练的视觉和语言导航（VLN）代理，通过空间扰动的轨迹提高了成功率高达1.5％。我们的代码将可用来促进生成数据的增强和应用程序，以促进下游机器人技术并体现AI任务。

We study the problem of synthesizing immersive 3D indoor scenes from one or more images. Our aim is to generate high-resolution images and videos from novel viewpoints, including viewpoints that extrapolate far beyond the input images while maintaining 3D consistency. Existing approaches are highly complex, with many separately trained stages and components. We propose a simple alternative: an image-to-image GAN that maps directly from reprojections of incomplete point clouds to full high-resolution RGB-D images. On the Matterport3D and RealEstate10K datasets, our approach significantly outperforms prior work when evaluated by humans, as well as on FID scores. Further, we show that our model is useful for generative data augmentation. A vision-and-language navigation (VLN) agent trained with trajectories spatially-perturbed by our model improves success rate by up to 1.5% over a state of the art baseline on the R2R benchmark. Our code will be made available to facilitate generative data augmentation and applications to downstream robotics and embodied AI tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题