对抗性模仿学习的潜在政策

论文标题

对抗性模仿学习的潜在政策

Latent Policies for Adversarial Imitation Learning

论文作者

Wang, Tianyu, Karnwal, Nikhil, Atanasov, Nikolay

论文摘要

本文考虑了从专家演示中学习机器人运动和操纵任务。生成对抗性模仿学习（GAIL）训练一个区分专家与代理过渡的区分，并依次使用由歧视者输出定义的奖励来优化代理商的策略生成器。这种生成的对抗训练方法非常强大，但取决于歧视者和发电机培训之间的微妙平衡。在高维问题中，歧视训练可能很容易过度拟合或与任务 - 近关系特征进行过渡分类。这项工作的一个关键见解是，在合适的潜在任务空间中进行模仿学习使训练过程稳定，即使在挑战高维问题中也是如此。我们使用动作编码器模型来获得低维的潜在动作空间，并使用对抗性模仿学习（Lapal）训练潜在政策。可以从州行动对离线训练编码器模型，以获得任务不合时宜的潜在动作表示形式，也可以通过歧视器和生成器培训同时在线获得，以获取任务意识到的潜在行动表示。我们证明了Lapal训练是稳定的，近乎单调的绩效提高，并在大多数运动和操纵任务中实现了专家性能，而Gail基线收敛速度较慢，并且在高维环境中无法实现专家的性能。

This paper considers learning robot locomotion and manipulation tasks from expert demonstrations. Generative adversarial imitation learning (GAIL) trains a discriminator that distinguishes expert from agent transitions, and in turn use a reward defined by the discriminator output to optimize a policy generator for the agent. This generative adversarial training approach is very powerful but depends on a delicate balance between the discriminator and the generator training. In high-dimensional problems, the discriminator training may easily overfit or exploit associations with task-irrelevant features for transition classification. A key insight of this work is that performing imitation learning in a suitable latent task space makes the training process stable, even in challenging high-dimensional problems. We use an action encoder-decoder model to obtain a low-dimensional latent action space and train a LAtent Policy using Adversarial imitation Learning (LAPAL). The encoder-decoder model can be trained offline from state-action pairs to obtain a task-agnostic latent action representation or online, simultaneously with the discriminator and generator training, to obtain a task-aware latent action representation. We demonstrate that LAPAL training is stable, with near-monotonic performance improvement, and achieves expert performance in most locomotion and manipulation tasks, while a GAIL baseline converges slower and does not achieve expert performance in high-dimensional environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题