模仿学习和目标条件增强学习的通用价值密度估计

论文标题

模仿学习和目标条件增强学习的通用价值密度估计

Universal Value Density Estimation for Imitation Learning and Goal-Conditioned Reinforcement Learning

论文作者

Schroecker, Yannick, Isbell, Charles

论文摘要

这项工作考虑了两个不同的设置：模仿学习和目标条件的强化学习。无论哪种情况，有效的解决方案都要求代理可靠地达到指定状态（目标）或一组状态（示范）。这项工作在概率的长期动力学和所需的价值函数之间建立了联系，它引入了一种方法，该方法利用密度估计的最新进展来有效地学习达到给定状态。作为我们的第一个贡献，我们将这种方法用于目标条件的强化学习，并表明它既有效又不遭受随机领域中的事后偏见。作为我们的第二个贡献，我们扩展了模仿学习的方法，并表明它在标准基准任务上实现了最先进的演示样本效率。

This work considers two distinct settings: imitation learning and goal-conditioned reinforcement learning. In either case, effective solutions require the agent to reliably reach a specified state (a goal), or set of states (a demonstration). Drawing a connection between probabilistic long-term dynamics and the desired value function, this work introduces an approach which utilizes recent advances in density estimation to effectively learn to reach a given state. As our first contribution, we use this approach for goal-conditioned reinforcement learning and show that it is both efficient and does not suffer from hindsight bias in stochastic domains. As our second contribution, we extend the approach to imitation learning and show that it achieves state-of-the art demonstration sample-efficiency on standard benchmark tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题