使用深层流程的样品有效的增强学习

论文标题

使用深层流程的样品有效的增强学习

Sample-efficient reinforcement learning using deep Gaussian processes

论文作者

Gadd, Charles, Heinonen, Markus, Lähdesmäki, Harri, Kaski, Samuel

论文摘要

强化学习提供了一个学习框架，以控制通过反复试验来完成任务的哪些行动。在许多应用程序中，观察相互作用是昂贵的，因此需要进行样本效率学习。通过学习模拟世界动态，可以提高基于模型的强化学习效率。面临的挑战是，模型不准确迅速积累了计划的轨迹。我们介绍了深度高斯过程，其中构图的深度引入了模型复杂性，同时纳入了有关动力学的先验知识，从而带来了平滑度和结构。我们的方法能够对轨迹进行贝叶斯后部样品。我们证明了与竞争方法相比，早期样品效率高度提高。这在许多连续的控制任务中都显示了这一点，包括半cheetah的联系动力学以前对早期的基于样品高斯流程的模型提出了无法克服的问题。

Reinforcement learning provides a framework for learning to control which actions to take towards completing a task through trial-and-error. In many applications observing interactions is costly, necessitating sample-efficient learning. In model-based reinforcement learning efficiency is improved by learning to simulate the world dynamics. The challenge is that model inaccuracies rapidly accumulate over planned trajectories. We introduce deep Gaussian processes where the depth of the compositions introduces model complexity while incorporating prior knowledge on the dynamics brings smoothness and structure. Our approach is able to sample a Bayesian posterior over trajectories. We demonstrate highly improved early sample-efficiency over competing methods. This is shown across a number of continuous control tasks, including the half-cheetah whose contact dynamics have previously posed an insurmountable problem for earlier sample-efficient Gaussian process based models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题