使用强化学习和生成模型对深层政策进行培训和评估

论文标题

使用强化学习和生成模型对深层政策进行培训和评估

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

论文作者

Ghadirzadeh, Ali, Poklukar, Petra, Arndt, Karol, Finn, Chelsea, Kyrki, Ville, Kragic, Danica, Björkman, Mårten

论文摘要

我们提出了一个数据效率的框架，用于解决顺序决策问题，该问题利用了增强学习（RL）和潜在可变生成模型的组合。该框架称为GenRL，通过引入潜在的可变量来训练深层政策，以便将馈电政策搜索分为两个部分：（i）训练一个子政策，该子政策在给定系统状态下的动作变量上输出分布，以及（ii）对生成模型的训练，该训练的训练能够输出一系列电动机动作的序列，该模型会导致运动动作的序列。 GenRL可以安全探索并减轻数据信息问题，因为它利用了有关运动动作的有效序列的先验知识。此外，我们提供了一组评估生成模型的措施，使我们能够在物理机器人进行实际培训之前预测RL策略培训的性能。我们通过实验确定生成模型的特征，这些模型对两项机器人任务的最终政策培训的性能最大，射击冰球并投掷篮球。此外，我们从经验上证明，与两种最新的RL方法相比，GenRL是唯一可以安全有效地解决机器人技术任务的方法。

We present a data-efficient framework for solving sequential decision-making problems which exploits the combination of reinforcement learning (RL) and latent variable generative models. The framework, called GenRL, trains deep policies by introducing an action latent variable such that the feed-forward policy search can be divided into two parts: (i) training a sub-policy that outputs a distribution over the action latent variable given a state of the system, and (ii) unsupervised training of a generative model that outputs a sequence of motor actions conditioned on the latent action variable. GenRL enables safe exploration and alleviates the data-inefficiency problem as it exploits prior knowledge about valid sequences of motor actions. Moreover, we provide a set of measures for evaluation of generative models such that we are able to predict the performance of the RL policy training prior to the actual training on a physical robot. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training on two robotics tasks: shooting a hockey puck and throwing a basketball. Furthermore, we empirically demonstrate that GenRL is the only method which can safely and efficiently solve the robotics tasks compared to two state-of-the-art RL methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题