Lipschitz受限的无监督技能发现

论文标题

Lipschitz受限的无监督技能发现

Lipschitz-constrained Unsupervised Skill Discovery

论文作者

Park, Seohong, Choi, Jongwook, Kim, Jaekyeom, Lee, Honglak, Kim, Gunhee

论文摘要

我们研究了无监督技能发现的问题，其目标是学习一组不同的有用技能，而没有外部奖励。基于最大化技能和国家之间的共同信息（MI），已经有许多技能发现方法。但是，我们指出，他们的MI目标通常更喜欢静态技能而不是动态技能，这可能会阻碍下游任务的应用。为了解决这个问题，我们提出了Lipschitz受限的技能发现（LSD），该技术鼓励代理商发现更多样化，动态和深远的技能。 LSD的另一个好处是，它学习的表示功能也可以用于以零拍的方式解决目标跟踪下游任务 - 即没有进一步的培训或复杂的计划。通过对各种Mujoco机器人运动和操纵环境进行的实验，我们证明了LSD在技能多样性，状态空间覆盖范围以及对七个下游任务的绩效方面的表现优于以前的方法，包括遵循多个目标的人类机体的挑战性任务。我们的代码和视频可在https://shpark.me/projects/lsd/上找到。

We study the problem of unsupervised skill discovery, whose goal is to learn a set of diverse and useful skills with no external reward. There have been a number of skill discovery methods based on maximizing the mutual information (MI) between skills and states. However, we point out that their MI objectives usually prefer static skills to dynamic ones, which may hinder the application for downstream tasks. To address this issue, we propose Lipschitz-constrained Skill Discovery (LSD), which encourages the agent to discover more diverse, dynamic, and far-reaching skills. Another benefit of LSD is that its learned representation function can be utilized for solving goal-following downstream tasks even in a zero-shot manner - i.e., without further training or complex planning. Through experiments on various MuJoCo robotic locomotion and manipulation environments, we demonstrate that LSD outperforms previous approaches in terms of skill diversity, state space coverage, and performance on seven downstream tasks including the challenging task of following multiple goals on Humanoid. Our code and videos are available at https://shpark.me/projects/lsd/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题