调制解调器：通过演示加速基于视觉模型的增强学习

论文标题

调制解调器：通过演示加速基于视觉模型的增强学习

MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations

论文作者

Hansen, Nicklas, Lin, Yixin, Su, Hao, Wang, Xiaolong, Kumar, Vikash, Rajeswaran, Aravind

论文摘要

样本效率较差仍然是用于现实世界应用，尤其是视觉运动控制的深入加固学习（RL）算法的主要挑战。基于模型的RL有可能通过同时学习世界模型并使用合成推广来计划和政策改进，从而有可能提高样本的效率。但是，在实践中，探索挑战的瓶颈瓶颈是基于模型的RL的样品学习。在这项工作中，我们发现仅利用少数演示可以显着提高基于模型的RL的样本效率。但是，仅将演示附加到交互数据集中就不够。我们确定在模型学习中利用演示的关键要素 - 策略预处理，有针对性的探索和演示数据的过采样 - 构成了我们基于模型的RL框架的三个阶段。我们经验研究了三个复杂的视觉运动控制域，发现我们的方法在完成稀疏奖励任务任务方面的成功率是150％-250％，与低数据制度（100K交互步骤，5个演示）相比，我们的方法在完成稀疏的奖励任务方面的成功率高。代码和视频可在以下网址找到：https：//nicklashansen.github.io/modemrl

Poor sample efficiency continues to be the primary challenge for deployment of deep Reinforcement Learning (RL) algorithms for real-world applications, and in particular for visuo-motor control. Model-based RL has the potential to be highly sample efficient by concurrently learning a world model and using synthetic rollouts for planning and policy improvement. However, in practice, sample-efficient learning with model-based RL is bottlenecked by the exploration challenge. In this work, we find that leveraging just a handful of demonstrations can dramatically improve the sample-efficiency of model-based RL. Simply appending demonstrations to the interaction dataset, however, does not suffice. We identify key ingredients for leveraging demonstrations in model learning -- policy pretraining, targeted exploration, and oversampling of demonstration data -- which forms the three phases of our model-based RL framework. We empirically study three complex visuo-motor control domains and find that our method is 150%-250% more successful in completing sparse reward tasks compared to prior approaches in the low data regime (100K interaction steps, 5 demonstrations). Code and videos are available at: https://nicklashansen.github.io/modemrl

下载PDF全文

下载文献需遵守相关版权规定

论文标题