体现AI的连续场景表示

论文标题

体现AI的连续场景表示

Continuous Scene Representations for Embodied AI

论文作者

Gadre, Samir Yitzhak, Ehsani, Kiana, Song, Shuran, Mottaghi, Roozbeh

论文摘要

我们提出了连续场景表示（CSR），这是一个场景表示，由一个在空间内导航的体现代理构建的场景表示，在该场景中，对象及其关系是通过连续有价值的嵌入方式建模的。我们的方法捕获了对象之间的特征关系，并将它们构成形成图形结构，并将体现的代理定位在表示形式中。我们的主要见解是在潜在空间中嵌入对象之间的成对关系。与构建场景表示形式相比，与离散关系（例如[支持]，[support]，[support]，[support]，[support]，[support]）相比，这允许具有更丰富的表示形式。 CSR可以在代理在场景中移动，相应地更新表示形式并检测房间配置的变化时跟踪对象。使用CSR，我们在没有任何特定任务的培训的情况下，在视觉室重排的挑战性下游任务方面优于最先进的方法。此外，我们显示了学到的嵌入式捕获场景的显着空间细节，并显示了对现实世界数据的适用性。夏季视频和代码可在https://prior.allenai.org/projects/csr上找到。

We propose Continuous Scene Representations (CSR), a scene representation constructed by an embodied agent navigating within a space, where objects and their relationships are modeled by continuous valued embeddings. Our method captures feature relationships between objects, composes them into a graph structure on-the-fly, and situates an embodied agent within the representation. Our key insight is to embed pair-wise relationships between objects in a latent space. This allows for a richer representation compared to discrete relations (e.g., [support], [next-to]) commonly used for building scene representations. CSR can track objects as the agent moves in a scene, update the representation accordingly, and detect changes in room configurations. Using CSR, we outperform state-of-the-art approaches for the challenging downstream task of visual room rearrangement, without any task specific training. Moreover, we show the learned embeddings capture salient spatial details of the scene and show applicability to real world data. A summery video and code is available at https://prior.allenai.org/projects/csr.

下载PDF全文

下载文献需遵守相关版权规定

论文标题