通过逆RL重写历史记录：改善政策的事后推断

论文标题

通过逆RL重写历史记录：改善政策的事后推断

Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement

论文作者

Eysenbach, Benjamin, Geng, Xinyang, Levine, Sergey, Salakhutdinov, Ruslan

论文摘要

多任务加强学习（RL）旨在同时学习解决许多任务的政策。几项先前的工作发现，通过不同的奖励功能重新标记过去的经验可以提高样本效率。重新标签方法通常询问：事后看来，我们是否假设我们的经验是最佳的某些任务？在本文中，我们表明，事后重新标记为逆RL，该观察结果表明我们可以在RL算法中使用逆RL来有效地解决许多任务。我们利用这个想法从先前的工作到任意类别的任务类别来概括目标标签技术。我们的实验证实，使用逆RL进行重新标记数据会在一般多任务设置中加速学习，包括进球，具有离散奖励集的域以及具有线性奖励功能的域。

Multi-task reinforcement learning (RL) aims to simultaneously learn policies for solving many tasks. Several prior works have found that relabeling past experience with different reward functions can improve sample efficiency. Relabeling methods typically ask: if, in hindsight, we assume that our experience was optimal for some task, for what task was it optimal? In this paper, we show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks. We use this idea to generalize goal-relabeling techniques from prior work to arbitrary classes of tasks. Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings, including goal-reaching, domains with discrete sets of rewards, and those with linear reward functions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题