具有不变因素的世界模型的对比度无监督学习

论文标题

具有不变因素的世界模型的对比度无监督学习

Contrastive Unsupervised Learning of World Model with Invariant Causal Features

论文作者

Poudel, Rudra P. K., Pandya, Harit, Cipolla, Roberto

论文摘要

在本文中，我们提出了一种世界模型，该模型使用不变性原则学习因果特征。特别是，我们使用对比的无监督学习来学习不变的因果特征，从而在观察的无关零件或样式的增强中实现不变性。基于世界模型的强化学习方法独立优化表示的学习和政策。因此，由于代表学习模块缺乏监督信号，幼稚的对比损失实施崩溃了。我们建议一项干预辅助任务来减轻此问题。具体而言，我们利用深度预测来明确执行不变性，并将数据增强用作RGB观察空间的样式干预。我们的设计利用了无监督的表示学学习具有不变因果特征的世界模型。我们所提出的方法在Igibson数据集上的分布范围内导航任务上的当前基于最新模型和无模型的强化学习方法极大地胜过。此外，我们提出的模型在我们的感知学习模块的SIM到现实转移方面表现出色。最后，我们评估我们在深态控制套件上的方法，并且仅由于深度不可用而隐式地执行不变性。尽管如此，我们提出的模型与最先进的对应物相当。

In this paper we present a world model, which learns causal features using the invariance principle. In particular, we use contrastive unsupervised learning to learn the invariant causal features, which enforces invariance across augmentations of irrelevant parts or styles of the observation. The world-model-based reinforcement learning methods independently optimize representation learning and the policy. Thus naive contrastive loss implementation collapses due to a lack of supervisory signals to the representation learning module. We propose an intervention invariant auxiliary task to mitigate this issue. Specifically, we utilize depth prediction to explicitly enforce the invariance and use data augmentation as style intervention on the RGB observation space. Our design leverages unsupervised representation learning to learn the world model with invariant causal features. Our proposed method significantly outperforms current state-of-the-art model-based and model-free reinforcement learning methods on out-of-distribution point navigation tasks on the iGibson dataset. Moreover, our proposed model excels at the sim-to-real transfer of our perception learning module. Finally, we evaluate our approach on the DeepMind control suite and enforce invariance only implicitly since depth is not available. Nevertheless, our proposed model performs on par with the state-of-the-art counterpart.

下载PDF全文

下载文献需遵守相关版权规定

论文标题