通过模型识别和经验重新标签

论文标题

通过模型识别和经验重新标签

Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling

论文作者

Mendonca, Russell, Geng, Xinyang, Finn, Chelsea, Levine, Sergey

论文摘要

强化学习算法可以自动地获取复杂任务的政策。但是，学习多种技能所需的样本数量可能是过多的。尽管元强制学习方法使代理商能够利用先前的经验来快速适应新任务，但它们的绩效至关重要地取决于新任务与以前经验丰富的任务的距离。当前的方法要么无法很好地推断出良好的外推，要么可以以大量的数据来进行元素训练。在这项工作中，我们提出了模型识别和经验重新标记（MIER），这是一种元强化学习算法，当在测试时面对分布式任务时，它既有效又外推。我们的方法基于一个简单的见解：我们认识到，与策略和价值函数相比，动态模型可以有效，稳定地通过非政策数据进行有效，稳定地调整。然后，这些动态模型可用于继续培训策略和价值功能，以实现分发任务，而无需使用元强制学习，通过为新任务生成合成体验。

Reinforcement learning algorithms can acquire policies for complex tasks autonomously. However, the number of samples required to learn a diverse set of skills can be prohibitively large. While meta-reinforcement learning methods have enabled agents to leverage prior experience to adapt quickly to new tasks, their performance depends crucially on how close the new task is to the previously experienced tasks. Current approaches are either not able to extrapolate well, or can do so at the expense of requiring extremely large amounts of data for on-policy meta-training. In this work, we present model identification and experience relabeling (MIER), a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time. Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data, more easily than policies and value functions. These dynamics models can then be used to continue training policies and value functions for out-of-distribution tasks without using meta-reinforcement learning at all, by generating synthetic experience for the new task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题