论文标题
元自动课程学习
Meta Automatic Curriculum Learning
论文作者
论文摘要
深入RL(DRL)社区的一个主要挑战是培训能够将其控制政策推广到从未在培训中看到的情况的代理商。对各种任务的培训已被确定为良好概括的关键要素,这促使研究人员使用通过复杂的连续参数空间控制的丰富程序性任务生成系统。在如此复杂的任务空间中,必须依靠某种形式的自动课程学习(ACL)将任务采样分布调整到给定的学习代理,而不是随机抽样任务,因为许多人最终可能是琐碎的或不可行的。由于很难在此类任务空间上获得先验知识,因此许多ACL算法探索了随着时间的流逝检测进度的任务空间,这是一个昂贵的Tabula-rasa过程,需要为每个新的学习代理执行,尽管它们可能在功能概况上具有相似之处。为了解决这一限制,我们介绍了元ACL的概念,并在黑盒RL学习者的背景下将其形式化,即试图将课程生成的算法与学习者的(未知)分布。在这项工作中,我们再次介绍了元ACL的第一个实例化,并在多个模拟环境中展示了其对课程生成比经典ACL的好处,包括程序生成的跑酷环境,并具有不同的形态学学习者。视频和代码可在https://sites.google.com/view/meta-acl上找到。
A major challenge in the Deep RL (DRL) community is to train agents able to generalize their control policy over situations never seen in training. Training on diverse tasks has been identified as a key ingredient for good generalization, which pushed researchers towards using rich procedural task generation systems controlled through complex continuous parameter spaces. In such complex task spaces, it is essential to rely on some form of Automatic Curriculum Learning (ACL) to adapt the task sampling distribution to a given learning agent, instead of randomly sampling tasks, as many could end up being either trivial or unfeasible. Since it is hard to get prior knowledge on such task spaces, many ACL algorithms explore the task space to detect progress niches over time, a costly tabula-rasa process that needs to be performed for each new learning agents, although they might have similarities in their capabilities profiles. To address this limitation, we introduce the concept of Meta-ACL, and formalize it in the context of black-box RL learners, i.e. algorithms seeking to generalize curriculum generation to an (unknown) distribution of learners. In this work, we present AGAIN, a first instantiation of Meta-ACL, and showcase its benefits for curriculum generation over classical ACL in multiple simulated environments including procedurally generated parkour environments with learners of varying morphologies. Videos and code are available at https://sites.google.com/view/meta-acl .